Write buffering

ABSTRACT

A hybrid storage system is described having a mixture of different types of storage devices comprising rotational drives, flash devices, SDRAM, and SRAM. The rotational drives are used as the main storage, providing lowest cost per unit of storage memory. Flash memory is used as a higher-level cache for rotational drives. Methods for managing multiple levels of cache for this storage system is provided having a very fast Level 1 cache which consists of volatile memory (SRAM or SDRAM), and a non-volatile Level 2 cache using an array of flash devices. It describes a method of distributing the data across the rotational drives to make caching more efficient. It also describes efficient techniques for flushing data from L1 cache and L2 cache to the rotational drives, taking advantage of concurrent flash devices operations, concurrent rotational drive operations, and maximizing sequential access types in the rotational drives rather than random accesses which are relatively slower. Methods provided here may be extended for systems that have more than two cache levels.

CROSS-REFERENCE(S) TO RELATED APPLICATIONS

This application is a continuation of application Ser. No. 15/665,321,filed Jul. 31, 2017 and issuing Oct. 15, 2019 as U.S. Pat. No.10,445,239, which is a continuation of application Ser. No. 14/689,045,filed Apr. 16, 2015 and issued as U.S. Pat. No. 9,734,067 on Aug. 15,2017, which claims the benefit of and priority to U.S. Provisional App.No. 61/980,561, filed Apr. 16, 2014. This U.S. Provisional Application61/980,561 is hereby fully incorporated herein by reference. U.S.application Ser. No. 14/689,045 is a continuation-in-part of applicationSer. No. 14/217,436, filed Mar. 17, 2014 and issued as U.S. Pat. No.9,430,386 on Aug. 30, 2016, which claims the benefit of and priority toApp. No. 61/801,422, filed Mar. 15, 2013. U.S. application Ser. Nos.15/665,321 and 14/689,045 and 14/217,436 and U.S. ProvisionalApplication 61/801,422 are each hereby fully incorporated by referenceherein.

BACKGROUND Field

This invention relates to the management of data in a storage systemhaving both volatile and non-volatile caches. It relates morespecifically to the methods and algorithms used in managing multiplelevels of caches for improving the performance of storage systems thatmake use of Flash devices as higher-level cache.

Description of Related Art

Typical storage systems comprising multiple storage devices usuallyassign a dedicated rotational or solid state drive as cache to a largernumber of data drives. In such systems, the management of the drivecache is clone by the host and the overhead brought about by thiscontributes to degradation of the caching performance of the storagesystem. Prior approaches to improving the caching performance focus onthe cache replacement policy being used. The most common replacementpolicy or approach to selecting victim data in a cache is the LeastRecently Used (LRU) algorithm. Other solutions consider the frequency ofaccess to the cached data, replacing less frequently used data first.Still other solutions keep track of the number of times the data hasbeen written while in cache so that it is only flushed to the media onceit reaches a certain write threshold. Others even separate read cachefrom write cache offering the possibility for parallel read and writeoperations.

The use of non-volatile storage as cache has also been described inprior art, declaring that response time for such storage systemsapproaches that of a solid state storage rather than a mechanical drive.However, prior solutions that made use of non-volatile memory as cachedid not take advantage of the architecture of the non-volatile memoriesthat could have further increased the caching performance of the system.The storage system does not make any distinction between a rotationaldrive and a solid-state drive cache thus failing to recognize possibleimprovements that can be brought about by the architecture of thesolid-state drive. Accordingly, there is a need for a cache managementmethod for hybrid storage system that takes advantage of thecharacteristic of flash memory and the architecture of the solid-statedrive.

SUMMARY

The present invention describes cache management methods for a hybridstorage device having volatile and non-volatile caches. Maximizingconcurrent data transfer operations to and from the different cachelevels especially to and from flash-based L2 cache results in increasedperformance over conventional methods. Distributed striping isimplemented across the rotational drives maximizing parallel operationson multiple drives. The use of Fastest-To-Fetch and Fastest-To-Flushvictim data selection algorithms side-by-side with the LRU algorithmresults in further improvements in performance.

Flow of data to and from the caches and the storage medium is managedusing a cache state-based algorithm allowing the firmware application tochoose the necessary state transitions that produces the most efficientdata flow.

The present invention is described in several exemplary hybrid storagesystems illustrated in FIGS. 1, 2, 3, and 4. The present invention isapplicable to additional hybrid storage device architectures, whereinmore details can be found in U.S. Pat. No. 7,613,876, entitled “HybridMulti-Tiered Caching Storage System”, which is incorporated herein byreference.

The methods through which read and write operations to the flash devicesare improved are discussed in U.S. Pat. No. 7,506,098, entitled“Optimized Placement Policy for Solid State Storage Devices,” which isincorporated herein by reference. The present invention uses such accessoptimizations in caching.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a diagram illustrating a hybrid storage device connecteddirectly to the host and to the rotational drives through the storagecontroller's available IO interfaces according to an embodiment of thepresent invention.

FIG. 2 is a diagram illustrating a hybrid storage device that is part ofthe host and connected directly or indirectly to hard disk drivesthrough its IO interfaces according to an embodiment of the presentinvention.

FIG. 3 is a diagram illustrating a hybrid storage device connectedindirectly to the host and indirectly to the hard disk drives though itsIO interfaces according to an embodiment of the present invention.

FIG. 4 is a diagram illustrating a hybrid storage device connectedindirectly to the host through a network and directly to the hard diskdrives through its IO interfaces according to an embodiment of thepresent invention.

FIG. 5A shows data striping in a single drive storage system accordingto an embodiment of the present invention.

FIG. 5B shows data striping in a multiple drive storage system usingsequential split without implementing parity checking according to anembodiment of the present invention.

FIG. 5C shows data striping in a multiple drive storage system usingdistributed stripes without implementing parity checking according to anembodiment of the present invention.

FIG. 5D shows data striping in a multiple drive storage system usingdistributed stripes and implementing parity checking according to anembodiment of the present invention.

FIG. 6 shows a cache line consisting of a collection of host LBA unitsaccording to an embodiment of the present invention.

FIG. 7 shows a process flow for initializing a hybrid storage devicesupporting data striping according to an embodiment of the presentinvention.

FIG. 8 shows a process flow for initializing a hybrid storage devicesupporting pre-fetch of data at boot-up according to an embodiment ofthe present invention.

FIG. 9A is a diagram illustrating a set-associative L2 cache accordingto an embodiment of the present invention.

FIG. 9B is a diagram illustrating a directly-mapped L2 cache accordingto an embodiment of the present invention.

FIG. 9C is a diagram illustrating a full-associative L2 cache accordingto an embodiment of the present invention.

FIG. 10 shows a cache line information table according to an embodimentof the present invention.

FIG. 11A shows a process flow for servicing host read commands accordingto an embodiment of the present invention.

FIG. 11B is a diagram illustrating host read command-related data flowsaccording to an embodiment of the present invention.

FIG. 12A shows a process flow for servicing host write commandsaccording to an embodiment of the present invention.

FIG. 12B is a diagram illustrating write command-related data flowsaccording to an embodiment of the present invention.

FIG. 13 shows a process flow for freeing L1 cache according to anembodiment of the present invention.

FIG. 14 shows a diagram illustrating optimized fetching of data from L2and flushing to HDD according to an embodiment of the present invention.

FIGS. 15A, 15B, 15C ₁, 15C₂, 15D show the cache state transition tablefor Host to L1 data transfer according to an embodiment of the presentinvention.

FIGS. 16A and 16B show the cache state transition table for L1 to Hostdata transfer according to an embodiment of the present invention.

FIGS. 17A and 17B show the cache state transition table for L2 to L1data transfer according to an embodiment of the present invention.

FIGS. 18A and 18B show the cache state transition table for L1 to L2data transfer according to an embodiment of the present invention.

FIGS. 19A and 19B show the cache state transition table for hard diskdrive to L1 data transfer according to an embodiment of the presentinvention.

FIGS. 20A and 20B show the cache state transition table for L1 to harddisk drive data transfer according to an embodiment of the presentinvention.

FIG. 21A shows an example initial state of L1 and L2 during normaloperation before a power loss occurs.

FIG. 21B illustrates the step of flushing valid dirty data from L1 to L2upon detection of external power loss, using a backup power source.

FIG. 21C shows the state of L1 and L2 before the backup power source iscompletely used up.

FIG. 21D shows the state of L1 and L2 upon next boot-up coming from anexternal power interruption. It also shows the step of copying validdirty data from L2 to L1 in preparation for flushing to rotationaldrives or transferring to host.

FIG. 21E shows the state of L1 and L2 after the valid dirty data from L2have been copied to L1.

FIG. 22 illustrates a hybrid storage device connected directly to thehost and to the rotational drives through the storage controller'savailable IO interface DMA controllers, in accordance with an embodimentof the invention.

DETAILED DESCRIPTION

Cache line is an unit of cache memory identified by a unique tag. Acache line consists of a number of host logical blocks identified byhost logical block addresses (LBAs). Host LBA is the address of a unitof storage as seen by the host system. The size of a host logical blockunit depends on the configuration set by the host. The most common sizeof a host logical block unit is 512 bytes, in which case the host seesstorage in units of 512 bytes. The Cache Line Index is the sequentialindex of the cache line to which a specific LBA is mapped.

HDD LBA (Hard-Disk Drive LBA) is the address of a unit of storage asseen by the hard disk. In a system with a single drive, there is aone-to-one correspondence between the host LBA and the HOD LBA. In thecase of multiple drives, host LBAs are usually distributed across thehard drives to take advantage of concurrent IO operations.

HDD Stripe is the unit of storage by which data are segmented across thehard drives. For example, if 32 block data striping is implementedacross 4 hard drives, the first stripe (32 logical blocks) is mapped tothe first drive, the second stripe is mapped to the second drive, and soon.

A Flash Section is a logical allocation unit in the flash memory whichcan be relocated independently. The section size is the minimum amountof allocation which can be relocated.

Directly-mapped, set-associative, and full-associative caching schemescan be used for managing the multiple cache levels. A cache lineinformation table is used to store the multi-level cache states and totrack valid locations of data. The firmware implements a set of cachestate transition guidelines that dictates the sequences of datamovements during host reads, host writes, and background operations.

FIG. 1 illustrates a hybrid storage device 101 connected directly to thehost 112 and to the rotational drives 105 through the storagecontroller's available IO interface DMA controllers 107 and 106respectively. The rotational drives 105 are connected to one or more IOinterface DMA controllers 106 capable of transferring data between thedrives 105 and the high-speed L1 cache (SDRAM) 104. Another set of IOinterface DMA controllers 107 is connected to the host 112 fortransferring data between the host 112 and the L1 cache 104. The Flashinterface controller 108 on the other hand, is capable of transferringdata between the L1 cache 104 and the L2 cache (flash devices) 103.

Multiple DMA controllers can be activated at the same time both in thestorage IO interface and the Flash interface sides. Thus, it is possibleto have simultaneous operations on multiple flash devices, andsimultaneous operations on multiple rotational drives.

Data is normally cached in L1 104, being the fastest among the availablecache levels. The IO interface DMA engine 107 connected between the host112 and the DMA buses 110 and 111 is responsible for high-speed transferof data between the host 112 and the L1 cache 104. There can be multipleIO interface ports connected to a single host and there can be multipleIO interface ports connected to different hosts. In the presence ofmultiple IO interface to host connections, dedicated engines areavailable in each IO interface ports allowing simultaneous data transferoperations between hosts and the hybrid device. The engines operatedirectly on the L1 cache memory eliminating the need for temporarybuffers and the extra data transfer operations associated with them.

For each level of cache, the firmware keeps track of the number of cachelines available for usage. It defines a maximum threshold of unusedcache lines, which when reached causes it to either flush some of theused cache lines to the medium or copy them to a different cache levelwhich has more unused cache lines available. When the system reachesthat pre-defined threshold of unused L1 cache, it starts moving datafrom L1 104 to L2 cache 103. L2 cache is slower than L1 but usually hasgreater capacity. L2 cache 103 consists of arrays of flash devices 109.Flash interface 108 consists of multiple DMA engines 115 and connectedto multiple buses 116 connected to the flash devices. Multipleoperations on different or on the same flash devices can be triggered inthe flash interface. Each engine operation involves a source and adestination memory. For L1 to L2 data movements, the flash interfaceengines copy data directly from the memory location of the source L1cache to the physical flash blocks of the destination flash. For L2 toL1 data movements, the flash interface engines copy data directly fromthe physical flash blocks of the source flash to the memory location ofthe destination L1 cache.

Transfers of data from L1 104 to hard disk drives 105 and vice versa arehandled by the DMA controllers of the IO interfaces 106 connected to thehard disk drives 105. These DMA controllers operate directly on the L1cache memories, again eliminating the need for temporary buffers. Datatransfers between L2 103 and the hard disk drives 105 always go throughL1 104. This requires synchronization between L2 and L1 be built intothe caching scheme.

Although FIG. 1 shows a system where the rotational drives 105 areoutside the hybrid storage device 101 connected via IO interfaces 106,slightly different architectures can also be used. For example, therotational drives 105 can be part of the hybrid storage device 101itself, connected to the storage controller 102 via a disk controller.Another option is to connect the rotational drives 105 to an IOcontroller connected to the hybrid storage controller 102 through one ifits IO interfaces 106. Similarly, the connection to the host is not inany way limited to what is shown in FIG. 1. The hybrid storage devicecan also attach to the host through an external IO controller. It canalso be attached directly to the host's network domain. More details ofthese various configurations can be found in FIGS. 1, 3, 4, 7, and 9 ofU.S. Pat. No. 7,613,876, entitled “Hybrid Multi-Tiered Caching StorageSystem”.

In FIG. 2, the hybrid storage device 201 is part of the host system 202,acting as cache for a group of storage devices 203 and 205. In theexample given, one of the IO interfaces 206 is connected directly to ahard disk drive 203. Another IO interface 207 is connected to anotherhybrid device 204 which is connected directly to another set of harddisk drives 205. Contrary to the example in FIG. 1 where the hybridstorage device is a slave device receiving IO commands from the host andtranslating them to subcommands delivered to the hard disk drives, andhandling the caching in between these processes, FIG. 2 shows a hosthybrid device doing caching of data on the host side using its owndedicated L1 and L2 caches. An example of this is a multi-ported HBA(Host Bus Adapter) with integrated L1 and L2 caches. In the HBA's pointof view, it is connected to, and thus capable of caching multiplestorage devices regardless of whether or not the attached storagedevices are also doing caching internally. The hybrid device interceptsIO requests coming from the host application and utilizes its built-incaches as necessary.

FIG. 3 is another variation of the architecture. In this case, a hybridstorage device 301 acts as a caching switch/bridge connected to the host302 via another hybrid storage device 303, which is shown as a HBA. Thehybrid storage device 301 is connected to a hybrid storage device 304also a plain rotational drive 305. In this example, all three devices301, 303, and 304 are capable of L1 and L2 caching.

In FIG. 4, the hybrid storage device 401 is directly connected to thenetwork 402 where the host 403 is also connected to. In this mode, thehybrid storage device can be a network-attached storage or anetwork-attached cache to other more remote storage devices. If it isused as a pure cache, it can implement up to three levels of caches, L1(SDRAM), L2 (Flash), and L3 (HDD).

In the example architectures illustrated such as FIG. 1, the host canconfigure the hybrid storage device to handle virtualization locally.The hybrid storage device presents the whole storage system to the hostas a single large storage without the host knowing the number and exactgeometry of the attached rotational drives.

A firmware application running inside the hybrid storage device isresponsible for the multi-level cache management.

Data Striping

If virtualization is implemented locally in the hybrid storage device,the device firmware can control the mapping of data across one or morerotational drives. Initially at first boot-up, the firmware willinitialize the IO interfaces and detect the number and capacity ofattached hard drives. It then selects the appropriate host LBA to HDDLBA mapping that will most likely improve the performance of the system.In its simplest form, the mapping could be a straightforward sequentialsplit of the host LBA among the drives. FIG. 5A shows division of datainto stripes in a single rotational drive. FIG. 5B shows sequentialdivision of stripes among multiple rotational drives. In this mappingscheme, given for example, 3 drives with 80 GB capacity each, the first80 GB seen by the host will be mapped to the first drive, the second 80GB to the second drive, and the last 80 GB mapped to the third drive.

This mapping scheme is simplest but not too efficient. A better mappingwould spread the data across the drives to maximize the possibility ofconcurrent operations. In this type of mapping, the firmware willdistribute the stripes across the drives such that sequential stripesare stored in multiple drives. FIG. 5C, shows distributed stripes acrossmultiple hard drives. In a system with 3 or more hard drives,distributed parity can be added for a RAID5-like implementation as shownin FIG. 5D.

The size of each stripe is configured at first boot-up. An exampleconfiguration is setting stripe size equal to the cache line size andsetting cache line size equal to the native flash block size or to theflash section size. A system with host LBU of X bytes, and with flashdevices with block size of Y bytes, a data stripe and a cache line willconsist of Y divided by X number of host logical blocks. FIG. 6 shows anexample cache line for a 16 KB flash section and 512 byte LBU.

FIG. 7 is a flowchart for the initialization part of data striping inthe hybrid storage device. If local virtualization is active, firmwareinitiates discovery of attached hard drives, and gets the flash sectionsize to be used as reference size for the stripe. If the number ofdetected drives is greater than two and the drives have equal capacitiesand RAID5 feature is set to on, RAID5 configuration is selected andparity stripes are assigned in addition to data stripes. If there areonly two drives, plain striping is implemented.

Pre-Fetching

At initialization, the hybrid storage device firmware offers the optionto pre-fetch data from the rotational drives to L1 cache. Sincerotational drives are slow on random accesses, firmware by default maychoose to pre-fetch from random areas in rotational drives. A moreflexible option is for the firmware to provide an external service inthe form of a vendor-specific interface command to allow the host toconfigure the pre-fetching method to be used by the firmware at boot-up.

If the system is being used for storing large contents such as video,the firmware can be configured to pre-fetch sequential data. If thesystem is being used for database applications, it can be configured topre-fetch random data. If fastest boot-up time is required, pre-fetchmay also be disabled.

In another possible configuration, the system may support ahost-controlled Non-Volatile Cache command set. This allows the host tolock specific data in the non-volatile L2 cache so that they areimmediately available at boot-up time. When the firmware detects thatdata was pinned by the host in the L2 non-volatile cache, itautomatically pre-fetches those data.

FIG. 8 shows the flowchart for doing data pre-fetching at boot-up time.

Caching Mode

In FIG. 1, the rotational drives 105 have the largest storage capacity.The flash devices 103, acting as second level cache, may have lesscapacity. The SDRAM 104, acting as first level cache, may have the leastcapacity. Both L1 cache and L2 cache can either be fully-associative,set-associative, or directly-mapped. In a full-associative cache, datafrom any address can be stored to any of the cache lines. In aset-associative cache, data from a specific address can be mapped to acertain set of cache lines. In a directly-mapped cache, each address instorage can be cached only to one specific cache line.

FIG. 9A shows an illustration of a set-associative L2 cache, where theflash devices are divided among the rotational drives. Data from HDD0can be cached to any of the 8 flash devices assigned to HDD0 (FDEV 00,FDEV 04, FDEV 08, FDEV 12, FDEV 16, FDEV20, FDEV24, and FDEV28), datafrom HDD1 can be cached to any of the 8 flash devices assigned to HDD1,and so on.

FIG. 9B shows an illustration of a directly-mapped L2 cache. In thissetup, each of the four hard drives has dedicated flash devices wheretheir data can be cached. In this example, data from HDD0 can only becached in FDEV00, HDD1 in FDEV01, and so on.

FIG. 9C is an illustration of a full-associative L2 cache. In thissetup, data from any of the four drives can be cached to any of the fourflash devices.

Full-associative caching has the advantage of cache usage efficiencysince all cache lines will be used regardless of the locations of databeing accessed. In the full-associative caching scheme, the firmwarekeeps cache line information for each set of available storage. In asystem with N number of cache lines, where N is computed as theavailable cache memory divided by the size of each cache line, thefirmware will store information for M number of cache lines, where M iscomputed as the total storage capacity of the system divided by thecache line size. This information is used to keep track of the state ofeach storage stripe.

FIG. 10 shows an example table for storing cache line information in afull-associative caching system. Since each storage stripe has its ownentry in the table, firmware can easily determine a stripe's cachingstate and location.

L1 Index is the cache line/cache control block number. HDD ID is thesequential index of the rotational drive where the data resides. HDD LBAis the first hard-disk LBA assigned to the cache line. L1 Address is theactual memory address where data resides in L1, and L2 Address is thephysical address of the location of data in L2.

The HDD ID and HDD LBA can be derived at runtime to minimize memoryusage of the table. L1 Cache State and L2 Cache State specify whetherthe SDRAM and/or the flash contain valid data. If valid, it alsospecifies if data is clean or dirty. A dirty cache contains a moreup-to-date copy of data than the one in the actual storage media, whichin this case is the rotational drive. Cache Sub-State specifies whethercache is locked because of on-going transfer between SDRAM and Host(sdram2host or host2sdram), SDRAM and Flash (sdram2flash orflash2sdram), or SDRAM and rotational drive (sdram2hdd or hdd2sdram).

Direct-mapping is less efficient in terms of cache memory usage, buttakes less storage for keeping cache line information. In a system withN number of cache lines, where N is computed as the available cachememory divided by the size of each cache line, the firmware can storeinformation for as few as N number of cache lines. When checking forcache hits, firmware derives the cache line index from the host LBA, andlooks directly to the assigned cache line information. Firmware comparesthe cache-aligned host LBA to the start of the currently cached LBArange and declares a hit if they are the same.

At initialization, firmware allocates memory for storing the cacheinformation. The amount of memory required for this depends on thecaching method used as discussed above.

Address Translation

Cache states stored in the cache line information or cache control blockoutlined in FIG. 10 specify the validity of data copy in L1 and L2caches. After inspection of cache states, the next step in processing anIO command is to locate the exact address of the target data, which isstored also in the cache line table in FIG. 10. If data is neither in L1nor in L2, a HostLBA2HDDLBA translation formula is used to derive theaddresses of the hard disk logical blocks where the data is stored.

The host LBA size is usually smaller than the flash block of L2, thus aset of logical blocks is addressed by a single physical block. Forexample, given a 512-byte host LBA and 16 KB flash block, 32 LBAs willfit into one flash block. Therefore, only one entry in the table isneeded for each set of 32 host logical blocks.

The cache information table is stored in non-volatile memory and fetchedat boot-up time. In systems with very large storage capacities, it mightnot be practical to copy the entire table to volatile memory at boot-uptime clue to boot-up speed requirement and limitation of availablevolatile memory. At boot-up only a small percentage of the table iscopied to volatile memory. In effect, the cache control block table isalso cached. If the table entry associated with an IO command beingserviced is not in volatile memory, it will be fetched from thenon-volatile memory and will replace a set of previously cached entries.

The HostLBA2HDDLBA translation formula depends on the mapping methodused to distribute the host logical blocks across the rotational drives.For example, if host data is striped across 4 rotational drives andparity is not implemented, the formula would look like the following:

HDDLBA=StripeSz*(NumHDD/SDRAMIdx)+HostLBA % StripeSz.

The index to the rotational drive can be derived through the formula:

HDDIdx=SDRAMIdx % NumHDD

In the first equation, StripeSz is specified in terms of logical blockunits.

Cache State Transitions

The firmware keeps track of data in the L1 and L2 caches using a set ofcache states which specifies the validity and span of data in each cacheline. The cache state information is part of the cache information tablein FIG. 10. Each cache level has its own cache state, and in addition,the field cache sub-state specifies whether the cache line is locked dueto an ongoing data transfer between caches, between the medium and acache, or between the host and a cache. Although the cache states arepresented in the table as one data field, the representation in theactual implementation is not restricted to using a single variable foreach cache state. For example, it may be a collection of flags and pagebitmaps but when treated collectively still equate to one of thepossible distinct states. The page bitmap is the accurate representationof which parts of the cache line are valid and which are dirty. As anexample, the cache line 601 of FIG. 6 has 32 host LBAs and the state ofeach LBA (whether valid, invalid, clean, or dirty) can be tracked byusing two 32-bit bitmap ValidBitmap and DirtyBitmap. Each bit in the twovariables represents one LBA in the cache line. For ValidBitmap, a bitset to one means the data in the corresponding LBA is valid. ForDirtyBitmap, a bit set to one means the data in the corresponding LBA ismore up to date than what is stored in the medium. The six possiblecache states are: Invalid, Valid Partially Cached Dirty, Valid FullyCached Partial Dirty, Valid Full Dirty, Valid Full Clean, and ValidPartially Cached Clean. The seven possible cache sub-states are: NOP,H2S, S2H, F2S, S2F, HDD2S and S2HDD.

A sub-state of NOP (No Operation) indicates that the cache is idle. H2Sindicates that the cache line is locked clue to an ongoing transfer ofdata from the host to L1. S2H indicates that the cache line is lockedclue to an ongoing transfer of data from L1 to host. F2S indicates thatthe cache line is locked clue to an ongoing transfer from L2 to L1. S2Findicates that the cache line is locked clue to an ongoing transfer fromL1 to L2. HDD2S indicates that the cache line is locked clue to anongoing transfer from the hard disk to L1. Finally, S2HDD indicates thecache line is locked clue to an ongoing transfer from L1 to the harddisk drive.

An Invalid cache line does not contain any data or contains stale data.Initially, all caches are invalid until filled-up with data duringpre-fetching or processing of host read and write commands. A cache lineis invalidated when a more up-to-date copy of data is written to alower-level cache thus making the copy of the data in higher levelcaches invalid (15038, 15040, 15041, 15043, 15046, 15048, 15050, 15055,15057, 15059, 15060, 15062, 15068, 15071, 15072, 15074, 15076, 15080,15084, 15085, 15090, 15095, 15096, 15100, 15101, 15103, 15105, 15109,15115, 15116, 15119, 15120, 15124, 15125, 15127, 17049 and 17050). Forexample if a dirty cache line in L1 is copied to L2 so that L1 can befreed up during an L1 cache full condition, and later new version ofthat data is written to L1 by host, the copy in L2 becomes old andunusable, so the firmware invalidates the cache line in L2. From aninvalid state, a write to an L1 cache line by host will result toswitching of state to either Valid Partially Cached Dirty (15037, 15039,15040, 15042, 15044, 15045, 15047, 15049 and 15050) or Valid Full Dirty(15036, 15038, 15041, 15043, 15046 and 15048), depending on whether thedata spans the entire cache line or not. On the other hand, a read fromthe medium to L1 makes an Invalid cache line either Valid PartiallyCached Clean (19036, 19038, 19039, 19041 and 19043) or Valid Full Clean(19037, 19040 and 19044). Finally, a read from L2 to an invalid L1 couldresult to inheritance of L2's state by L1 (17037, 17038, 17040, 17042and 17044). However, if the data from L2 is not enough to fill theentire L1 cache line, the resulting state of L1 would either be ValidPartially Cached Clean (17039) or Valid Partially Cached Dirty (17041and 17043). From an invalid state, a write to an L2 cache line willresult to inheritance of state from L1 to L2 (18042, 18050, 18056,18062, and 18068).

Valid Partially Cached Dirty state indicates that the cache line ispartially filled with data and some or all of these data are dirty. Adirty copy of data is more up-to-date than what is stored in the actualmedium. An example sequence that will result to this state is a partialWrite FUA command to an Invalid cache line followed by a partial normalWrite command. The Write FUA command partially fills the L1 cache linewith clean data (19036, 19038, 19039, 19041 and 19043), and the normalWrite command makes the partial L1 cache line dirty (15114, 15117,15118, 15121, 15127 and 15128). L1 cache will take on a Valid PartiallyCached Dirty state whenever new data transferred from the host or L2cache is not enough to fill its entire cache line (15037, 15039, 15040,15042, 15044, 15045, 15047, 15049, 15050, 15053, 15058, 15059, 15066,15067, 15070, 15075, 15076, 15114, 15117, 15118, 15121, 15127, 15128,17037, 17041, 17043, 17047, 17052, 17054 and 17075). Transfer of datafrom hard disk drive or from L2 to a Valid Partially Cached Dirty L1occurs when the firmware wants to fill-up the un-cached portions of theL1 cache. When the transfer completes, the L1 cache either becomes ValidFull Dirty (17049 and 17051) or Valid Fully Cached Partial Dirty (17046,17050, 17053, 19046, 19048 and 19053), depending on whether the entirecache line became dirty or not. However, for cases wherein datatransferred from the hard disk drive or L2 cache is not enough to fillall un-cached portion of the L1 cache, its state remains in ValidPartially Cached Dirty (17047, 17052, 17054, 19045, 19047 and 19052).Flushing of dirty bytes from a Valid Partially Cached Dirty L1 to themedium will either cause its state to change to Valid Partially CachedClean (20042, 20044, 20046, 20049, 20051 and 20053) or stay in ValidPartially Cached Dirty (20043, 20045, 20047, 20050 and 20054) dependingon whether all dirty bytes were flushed to the medium or just a portionof it. A host write to a Valid Partially Cached Dirty L1, either makesit Valid Full Dirty, Valid Fully Cached Partial Dirty, or leave it asValid Partially Cached Dirty, depending on the span of data written bythe host. If the new data covers the entire cache, it naturally becomesValid Full Dirty (15051, 15055, 15062, 15068, and 15071). If the newdata fills all un-cached bytes and all clean bytes, L1 still becomesValid Full Dirty (15052, 15056, 15057, 15063, 15065, 15069, 15072 and15074). If the new data fills all un-cached bytes but some bytesremained clean, L1 becomes Valid Fully Cached Partial Dirty (15054,15060, 15064 and 15073). Finally, if the new data does not fill allun-cached, L1 stays as Valid Partially Cached Dirty (15053, 15058,15059, 15066, 15067, 15070, 15075 and 15076). L2 will switch to ValidPartially Cached Dirty state if a Valid Partially Cached Dirty L1 iscopied to it (18042, 18043 and 18048) and copied data does not fill theentire cache line of L2. Data transfer from the host to L1 couldinvalidate some of the data in L2 effectively causing L2's state toswitch to Valid Partially Cached Dirty (15044, 15047, 15063, 15065,15067, 15069, 15070, 15098 and 15110). L2 will likewise switch to ValidPartially Cached Dirty if it shares the same set of data with LI, andsome of the dirty bytes in L1 were flushed to the medium (20079). Whennew data is written by the host to L1 overlaps with the data in L2, theL2 copy becomes invalid (15038, 15040, 15055, 15057, 15059, 15060,15090, 15105, 15115 and 15116) or Valid Partially Cached Clean (15092and 15118), otherwise it will stay in its Valid Partially Cached Dirtystate (15039, 15056, 15058, 15091, 15093, 15106 and 15117). A transferfrom L1 to L2 could also change L2's state from Valid Partially CachedDirty to Valid Full Dirty (18044 and 18063) or Valid Fully CachedPartial Dirty (18057, 18069 and 18070), depending on whether the entirecache line became dirty or not as a result of the data transfer. If thedirty bytes in L1 is flushed to the medium incidentally coincides withthe dirty bytes in L2, the L2 copy becomes Valid Partially Cached Clean(20044, 20062, 20072 and 20073.

A Valid Full Clean state indicates that the entire cache line is filledwith data that is identical to what is stored in the actual medium. Thishappens when un-cached data is read from the medium to L1 (19037, 19040,19044, 19073, 19076 and 19080), or when data in L1 is flushed to themedium (20049, 20060, 20062, 20065, 20068, 20070, 20072 and 20077). Adata transfer from L1 could also result to a Valid Full Clean state forL2 if data copied to L2 matches what is stored in the hard disk (18050and 18075). Likewise, L1 will switch to a Valid Full Clean state (17038,17076 and 17080) following a transfer from L2, if cached data in L2matches what is stored in the hard disk and transferred data from L2 isenough to fill the entire L1 cache line. When written with new data, aValid Full Clean either becomes Valid Full Dirty (15077, 15080, and15084) or Valid Fully Cached Partial Dirty (15078, 15081, 15085 and15086), depending on whether the new data spans the entire cache line ornot. A Valid Full Clean L2 could become Valid Partially Cached Clean(15042, 15081 and 15121) or could be invalidated (15041, 15080, 15119and 15120) depending on whether new data written to LI by the hostinvalidates a portion or the entire content of L2.

The Valid Fully Cached Partial Dirty state indicates that the entirecache line is filled up with data and some of the data are dirty. Anexample sequence that will result to such state is a Read FUA command ofthe entire cache line followed by a partial Write command. The Read FUAcommand copies the data from the medium to L1, making L1 Valid FullClean (19037, 19040, 19044, 19054, 19056, 19059, 19073, 19076 and19080), and the following partial Write command makes some of the datain the cache line dirty (15078, 15081, 15085 and 15086). Writing data toun-cached portions of a partially filled LI could likewise result to aValid Fully Cached Partial Dirty state (15054, 15060, 15064, 15073,15113, 15116, 15120, 15125, 15126, 17046, 17050, 17053, 17074, 19046,19048 and 19053). Writing this LI cache line to L2 in turn, makes L2inherit the state of LI as Valid Fully Cached Partial Dirty (18056,18057 and 18061). Similarly, copying a Valid Fully Cached Partial DirtyL2 to L1 will make L1 inherit the state of L2 (17040 and 17050).Transferring data from L1 to un-filled portions of L2 would likewisecause L2's state to switch to Valid Fully Cached Partial Dirty (18049,18069 and 18070). A Valid Fully Cached Partial Dirty L1 will remain inthis state until a portion of the dirty bytes in L1 were flushed to themedium after which it would shift to a Valid Full Clean state (20061,20063, 20066 and 20069). Furthermore, when the host writes new data tothe L1 cache, L1 either stays as Valid Fully Cached Partial Dirty(15089, 15092, 15097, 15098, 15102 and 15103) or becomes Valid FullDirty (15087, 15088, 15090, 15091, 15093, 15095, 15096, 15100 and15101). A Valid Fully Cached Partial Dirty L2 cache, on the other hand,either gets invalidated (15043, 15062, 15095 and 15096), switches toValid Partially Cached Dirty state (15044, 15063, 15065, 15067 and15098) or changes state to Valid Partially Cached Clean (15045, 15064,15066 and 15097) following a transfer from the host to L1. Copying datafrom a Valid Fully Cached Partial Dirty L2 to L1 would likewiseinvalidate the contents of L2 (17049 and 17050). Flushing all cacheddirty bytes from L1 will cause L2's state to change from Valid FullyCached Partial Dirty to Valid Full Clean (20049 and 20065), otherwise,L2 stays in Valid Fully Cached Partial Dirty state (20050, 20051 and20067).

The Valid Full Dirty state indicates that the entire cache line containsnewer data than what is stored in the medium. L1 may become Valid FullDirty from any state (i.e. Invalid: 15036, 15038, 15041, 15043, 15046and 15048; VPCD: 15051, 15052, 15055, 15056, 15057, 15062, 15063, 15065,15068, 15069, 15071, 15072 and 15074; VFCPD: 15087, 15088, 15090, 15091,15093, 15095, 15096, 15100 and 15101; VFD: 15104, 15105, 15106, 15109and 15110; VFC: 15077, 15080 and 15084; VPCC: 15112, 15115, 15119 and15124) once the host writes enough data to it to make all its datadirty. Aside from this, a Valid Full Dirty L1 may also be a result of apreviously empty or Valid Partially Cached Dirty L1 that has been filledup with dirty bytes from L2 (17042, 17049 and 17051). A Valid Full DirtyL1 will stay at this state until flushed out to the medium, after whichit will become Valid Full Clean (20071, 20073 and 20078) or Valid FullyCached Partial Dirty (20072, 20074, 20075 and 20079). A Valid Full DirtyL2 is a result of data transfer from a Valid Full Dirty L1 to L2 (18062and 18063) or when new data copied from L1 is enough to fill allportions of L2 (18044). L2 will stay at this state until the host writesnew data to L1 which effectively invalidates portions or the entire datain L2. If only a portion of cached data in L2 is invalidated a ValidFull Dirty L2 switches to Valid Partially Cached Dirty state (15047,15069, 15070 and 15110), otherwise it switches to Invalid state (15046,15068 and 15109). The state of L2 could also change from Valid FullDirty to Valid Partially Cached Dirty (20079) or Valid Full Clean(20078) depending on whether all or just a portion of the dirty bytes inL1 was flushed to the medium.

The Valid Partially Cached Clean state indicates that the cache ispartially filled with purely clean data. For L1, this may be a result ofa partial Write FUA (20081, 20082, 20083 and 20086), or partial Read FUAcommand (19036, 19038, 19039, 19041, 19043, 19072, 19074, 19075 and19079), or flushing of a partially cached dirty L1 to the hard diskdrive (20042, 20044, 20046, 20049, 20051 and 20053) or data transferredfrom L2 to L1 cache did not fill entire L1 cache line (17039, 17044,17077 and 17081). A Valid Partially Cached Clean will transition toValid Full Clean if remaining un-cached data are read from the medium(19073, 19076 and 19080) or from L2 (17076 and 17080) to L1. When hostwrites to a Valid Partially Cached Clean L1, the L1 state willtransition to Valid Full Dirty, Valid Fully Cached Partial Dirty, orValid Partially Cached Dirty. If the written data covers the entirecache line, the L1 becomes Valid Full Dirty (15112, 15115, 15119 and15124). If the new data does not cover the entire cache line, L1 becomesValid Partially Cached Dirty (15114, 15117, 15118, 15121, 15127 and15128). If the new data does not cover the entire cache line but wasable to fill all un-cached data, L1 becomes Valid Fully Cached PartialDirty (15113, 15116, 15120, 15125 and 15126). When data from L2 iscopied to a Valid Partially Cached Clean L1, it could likewisetransition to Valid Partially Cached Dirty state (17075), Valid FullyCached Partial Dirty (17074), Valid Full Clean (17076 and 17080), orValid Partially Cached Clean (17077 and 17081). A Valid Partially CachedClean L2 is the result of a Valid Partially Cached Clean L1 beingwritten to L2 (18068 and 18074), or a Valid Partially Cached Dirty L1being flushed out to the medium (20044, 20053 and 20054). A ValidPartially Cached Clean L2 could likewise result from a host to L1transfer whenever some of the data in L2 gets invalidated (15042, 15045,15064, 15066, 15081, 15092, 15097, 15118 and 15121). When host writes toL1, the entire contents of a Valid Partially Cached Clean L2 would beinvalidated if the data transferred by the host overlaps with thecontents of L2 (15038, 15040, 15055, 15057, 15059, 15060, 15090, 15085,15105, 15115 and 15116) otherwise it stays in Valid Partially CachedClean state (15092 and 15118). Transferring new data bytes from L1 willcause a transition of L2's state from Valid Partially Cached Clean toValid Partially Cached Dirty state (18048) or Valid Fully Cached PartialDirty state (18049 and 18061) depending on whether copied data from L1fills the entire L2 cache line or not. A Valid Partially Cached Clean L2could also transition to Valid Full Clean (18075) if data transferredfrom L1 fills empty cache bytes of L2, otherwise, L2 stays in ValidPartially Cached Clean state (18074).

FIGS. 15A, 15B 15C₁, 15C₂, and 15B, FIGS. 16A and 16B, FIGS. 17A and17B, FIGS. 18A and 18B, FIGS. 19A and 19B and FIGS. 20A and 20B show thecomplete tables showing the state transitions that occur in a hybridstorage system with two levels of cache. For systems with more than twocache levels, the additional table entries can easily be derived usingthe same concepts used in the existing table.

Read Command

The succeeding paragraphs discuss in details, the processing of a Readcommand by a hybrid storage device as described by the flow chartillustrated in FIG. 11A. The process performs different types of cacheoperations which make use of different cache transition tables. Thecache transition tables used are also discussed.

When the firmware receives a Read command from the host, it derives thecache control block index (SDRAM Index) based on the host LBA. Then itchecks the designated cache control block if the requested LBA is in L1cache.

If L1 cache is valid and the associated cache control block entry is forthe requested block, firmware starts data transfer from L1 cache to hostand updates cache sub-status to S2H (SDRAM to host). Note that there are5 defined valid cache states (valid full clean (VFC), valid full dirty(VFD), valid partially cached clean (VPCC), valid partially cached dirty(VPCD), and valid fully cached partial dirty (VFCPD)), and beforefirmware can initiate L1 cache to host data transfer and update thecache sub-state to S2H, it must check first if there is an ongoinglocked cache operation. Should there be any ongoing locked cacheoperation, the firmware will wait until the operation is finished (orcurrent cache sub-state becomes NOP) before initiating the data transferfrom L1 cache to host. FIGS. 16A and 16B lists the 5 defined validstates for L1 cache (and other states) and the combination with L2 cachestate and cache sub-state values for allowable and non-allowable datatransfer from L1 cache to host. As an example, assuming the requesteddata being targeted by the received Read command from the host is LBA0-99 and is determined to be in L1 cache based on the cache lineinformation table. Based on FIGS. 16A and 16B, firmware may execute readfrom L1 cache to host provided that current cache sub-state is NOP. Notealso that S2H operation can be initiated regardless of the valid currentstate of L2 cache since content of the L1 cache is always the latest ormost updated copy.

If L1 cache is valid but a different entry is stored in the associatedcache control block (for the case of directly mapped cache), thefirmware initiates the freeing of that cache. If that cache is clean, itcan be freed instantly without any flush operation. But if the cache isdirty, firmware gets the associated flash physical location of data fromcache control info and initiates copying of data to L2 cache afterdetermining that there is enough space for the L1 cache content to beflushed, which is faster than flushing to rotational drive. Then itupdates sub-status to sdram2flush (S2F). Refer to “movement from L1cache to L2 cache” for detailed discussion on this cache operation.FIGS. 18A and 18B lists the cache state transition for L1 cache to L2cache data transfer.

However, if L2 cache is full, flushing to rotational drive will beinitiated instead, and sub-status will be set to S2HDD (SDRAM to harddisk drive). Flushing of L2 cache to rotational drives can also be donein the background when firmware is not busy servicing host commands.After flushing of L1, firmware proceeds with the steps below as if datais not in L1 cache. Refer to “flushing of L1 cache” subsection of thisdocument for a detailed discussion on the flushing of L1 cache mentionedin the Read operation. FIGS. 20A and 20B lists the cache statetransition for L1 cache to rotational drive data transfer.

If data is not in L1 cache, firmware checks state of L2 cache.

If L2 cache is valid, firmware gets the physical location of data basedon the L2 address field of the cache control info table and startstransfer from L2 cache to L1 cache, and updates sub-status to F2S (flashto SDRAM). FIGS. 17A and 17B lists the current L1 cache state, L2 cachestate, and cache sub-state condition requirements for F2S operation.Based on the table, F2S operation can be initiated when current cachesub-state is NOP and current L1 cache state can be INVLD, VPCD, or VPCC.The same as the previously mentioned cache operations, F2S can only beinitiated by firmware if there is no ongoing locked cache operation. Ifthere is no available L1 cache (L1 cache full), firmware selects an L1cache victim. If the selected victim is clean, or if it is dirty butconsistent with the copy in L2 cache, it is freed instantly. Otherwise,it is flushed to the rotational drive. The cache is then invalidated andassigned to the current command being serviced. For example, the readcommand is requesting LBA 20-25 which is located in L2 cache.

Assuming the configuration is 10 LBAs per L1 cache line or index, therequested LBAs are mapped to L1 index #2 of the cache controlinformation table. To start the transfer of the data from L2 cache to L1cache, firmware checks L1 cache state if it is not yet full. If notfull, firmware searches for an available L1 address (ex. 0x0000_3000),assigned it to L1 index #2, and set the cache sub-state value from NOPto F2S. However, if current L1 cache is full (VFC, VFD, or VFCPD), an L1address is selected. Assuming the selected L1 address is 0x0001_0000,firmware checks from the L1 segment bitmap if the content of thisaddress is clean or dirty. If clean, then the address is invalidated. Ifdirty, firmware flushes to the rotational drive if needed, beforeinvalidating the selected L1 address. Once invalidated, firmwareinitiates LBA 20-29 transfer from L2 cache to L1 cache address0x0001_0000 once the current cache sub-state is NOP. After completingthe data transfer, firmware updates the L1 cache state and sets cachesub-state back to NOP.

If L2 cache is invalid, the firmware determines physical location ofdata in rotational drives, starts transfer of data from rotational driveto L1 cache, and updates sub-status to HDD2S (hard disk drive to SDRAM).For example, LBA 100-199 is being requested by a received Read commandfrom the host, and based on the cache control information table, thisLBA range is not in the cache (L1 and L2). After determining, thephysical location in the hard disk using the HostLBA2HDDLBA translationformula, firmware selects a free L1 cache address and initiates the datatransfer from the hard disk to the selected L1 cache address when no L1cache operation is happening.

Note that HDD2S cache operation can also be initiated for other valuesof L2 cache state. FIG. 19 lists L1 cache state, L2 cache state, andcache sub-state current values, the allowable event for each cachecombinations, and the resulting states per event. Based on the figure,HDD2S can be initiated when—(1) current L1 or L2 cache states is notfull dirty, (2) current L1 cache state is VPCD and current L2 cachestate is valid full, (3) current L1 cache state is VFC and current L2cache state is dirty, (4) current L1 cache state is VFCPD and current L2cache state is VFD, (5) current L1 cache state is VPCC and current L2cache state is VFCPD or VFP, and (6) there's no ongoing cache operation.The case when current L1 cache state is VFC and VFCPD, and HDD2S isinitiated, is applicable only when the received command is Read FUAwhere clean data is read directly from the hard disk regardless ifthere's a cache hit or not. Note also that if L1 cache is full, flushingof L1 cache is done before fetching from HDD can occur.

Upon completion of S2H, firmware clears cache sub-status (NOP), sendscommand status to host, and completes command. FIGS. 16A and 16B alsolists the cache state transitions when completing the data transfer fromL1 cache to host. The figure details the corresponding next L1 and L2cache states based on their current states after finishing the datatransfer. Based on the figure, L1 cache and L2 cache states are retainedeven after the host completed reading from L1 cache (16042, 16403,16045-16048, 16050, 16053-16055, 16057, 16059-16061, 16064, 16066-16068,and 16071). However, cache sub-state transitions to NOP after theoperation.

Upon completion of F2S, firmware updates cache control block (L1 cacheis now valid) and starts transfer from L1 cache to host. Sub-status ismarked as S2H. FIGS. 17A and 17B also lists the cache state transitionswhen completing the data transfer from L2 cache to L1 cache. The figuredetails the corresponding next L1 and L2 cache states based on theircurrent states after finishing the data transfer. As illustrated in thefigure, cache sub-state always transitions to NOP after the operation.

If current L1 cache state is invalid, its next state is set depending onthe current L2 cache state and the type of L2 to L1 data transfer. Ifcurrent L2 cache state is VPCD or VPCC, the L1 cache state is also setto the L2 cache state after the operation (17037 and 17044). If currentL2 cache state is VFC, VFCPD, or VFD, current L1 cache state is setdepending on the 2 type of L2 to L1 data transfer event—(1) entire L1cache is filled after transferring the data from L2 cache and (2) L1cache is not filled after the data transfer. If (1), L1 cache is set tothe L2 cache state (17038, 17040, and 17042). If (2), L1 cache state isset to VPCC if current L2 cache state is VFC (17039), set to VPCD ifcurrent L2 cache state is VFCPD (17041), or set to VPCD is current L2cache state is VFD (17043).

If current L1 cache state is VPCD, its next state is set depending onthe current L2 cache state. If current L2 cache state is also VPCD, L1cache state is set based on the 2 events described on the previousparagraph. If (1), L1 cache state is set to VFCPD (17046). If (2), L1cache state is set to VPCD (17047). If current L2 cache state is VFCPD,L1 cache state is set based on another 2 L2 to L1 data transferevents—(1) all un-cached bytes in L1 are dirty in L2 and (2) not allun-cached bytes in L1 are dirty in L2. (1), L1 cache state is set to VFD(17049). If (2), L1 cache state is set to VFCPD (17050). For the 2cases, L2 cache state is set to INVLD after F2S operation. If current L2cache state is VFD, LI cache is set based on the former 2 events—(1)entire L1 cache is filled after the operation and (2) L1 cache is notfilled after the operation. If (1), L1 cache state is set to VFD(17051). If (2), LI cache state is set to VPCD (17052). If current L2cache state is VPCC, L1 cache state is set based also on the 2 previousevents. If (1), L1 cache state is set to VFCPD (17053). If (2), L1 cachestate is set to VPCD (17054).

If current L1 cache state is VPCC, its next state is set to VFCPD or VFCif current L2 cache state is VPCD or valid clean, respectively (17074 or17076/17080) for the case when the entire L1 cache is filled after thedata transfer. L1 cache state is set to VPCD or VPCC if current L1 cachestate is VPCD or valid clean (17075 or 17077/17081) for the case theentire L1 cache is not filled after the data transfer.

Upon completion of HDD2S, firmware updates cache control block (L1 cacheis now valid) and starts transfer from L1 cache to host. FIGS. 19A and19B also lists the cache state transitions when completing the datatransfer from rotational disks to L1 cache. The figure-details thecorresponding next L1 and L2 cache states based on their current statesafter finishing the data transfer. Cache sub-state always transitions toNOP after the operation. Note that although L2 cache state is notaffected since the operation only involves the L1 cache and therotational drive, its current state affects the L1 cache succeedingcache state as listed in the figures.

If current L1 cache state is INVLD, its next state is set depending onthe current L1 cache state. If current L2 cache state is INVLD, VFC, orVPCC, L1 cache state is set based on 2 events—(1) data from the harddrive did not fill the entire cache and (2) data from the hard drivefilled the entire cache. If (l), L1 cache state is set to VPCC (19036,19039, and 19043). If (2), L1 cache state is set to VFC (19037, 19040,and 19044). If current L2 cache state is VPCD, L1 cache state is set toVPCC (19038). If current L2 cache state is VFCPD, L1 cache state is setto VPCC (19041).

If current L1 cache state is VPCD and current L2 cache state is INVLD,VPCD, or VPCC, the L1 cache state is set based also on the 2 eventsdiscussed on the previous paragraph. If (1), L1 cache state is set toVPCD (19045, 19047, and 19052). If (2), L1 cache state is set to VFCPD(19046, 19048, and 19053).

If current L1 cache state is VFC or VFCPD, the state is retained afterthe operation (19054, 19056, 19059-19063, and 19065).

If current L1 cache state is VPCC its next state is set depending on thecurrent L1 cache state. If current L2 cache state is INVLD, VFC, orVPCC, the L1 cache state is set based also on the 2 events discussed ona previous paragraph. If (1), L1 cache state is set to VPCC (19072,19075, and 19079). If (2), L1 cache state is set to VFC (19073, 19076,and 19080). If current L2 cache state is VPCD, L1 cache state is set toretained (19074).

In the background, when interface is not busy, firmware initiatescopying if L1 cache to L2 cache, flushing of L1 cache to rotationaldrives, and flushing of L2 cache to rotational drives.

Note that when the received command is Read FUA, the data is fetchedfrom the rotational drive regardless if there is a cache hit or not. Ifthere is, however, a cache hit for the Read FUA command and the cache isdirty, the cache is flushed to the rotational drive before the data isfetched.

Write Command

When firmware receives a Write command from the host, it derives thecache control block index (SDRAM Index) based on the host LBA. Then itchecks the designated cache control block if requested LBA is in L1cache.

If L1 cache state is invalid (INVLD) and there is no ongoing lockedoperation (NOP), firmware start transfer from host to L1 cache andupdates cache sub-status to H2S. After completion of host2sdramtransfer, firmware updates cache sub-status to NOP. If the write datauses all of the cache line space, L1 cache state becomes VFD (e.g.15036), otherwise L1 cache state becomes VPCD (e.g. 15037). For the casewhen write data uses all of the L1 cache line space, the copy in L2cache becomes INVLD (e.g. 15038).

If L1 cache state is valid (VPCD, VFC, VFCPD, VFD, VPCC), there is noongoing locked operation (NOP), and the associated cache contains thecorrect set of data, firmware start transfer from host to L1 cache andupdates cache sub-status to host2sdram. After completion of host2sdramtransfer, firmware updates cache sub-status to NOP.

If L1 previous cache state is VPCD, there are 4 options: (1) if writedata uses all of the cache line space, L1 cache state becomes VFD (e.g.15055). (2) If write data is less than the cache line space, there's nomore free cache line space, and there's no more clean cache area, L1cache state becomes VFD (e.g. 15057). (3) If write data is less than thecache line space and there's still some free cache line space, L1 cachestate becomes VPCD (e.g. 15058). (4) If write data is less than thecache line space, there's no more free cache line space, and there'sstill some clean cache area, L1 cache state becomes VFCPD (e.g. 15060).

If L1 previous cache state is VFC, there are 2 options: (1) if writedata uses all of the cache line space, L1 cache state becomes VFD (e.g.15080), (2) if write data is less than the cache line space, L1 cachestate becomes VFCPD (e.g. 15081), since not all the cache data were overwritten.

If L1 previous cache state is VFCPD, there are 3 options: (1) If writedata uses all of the cache line space, L1 cache state becomes VFD (e.g.15087). (2) If write data is less than the cache line space and there'sno more clean cache line space, L1 cache state becomes VFD (e.g. 15088).(3) If write data is less than the cache line space, and there's stillsome clean cache area, L1 cache state becomes VFCPD (e.g. 15089).

If L1 previous cache state is VFD, there is only 1 option: (1) L1 cachestate remains at VFD no matter what the write data size is (e.g. 15105).

If L1 previous cache state is VPCC, there are 3 options: (1) if writedata uses all of the cache line space, L1 cache state becomes VFD (e.g.15115), (2) If write data is less than the cache line space and there'sstill some free cache line space, L1 cache state becomes VPCD (e.g.15117). (3) If write data is less than the cache line space, there's nomore free cache line space, and there's still some clean cache area, L1cache state becomes VFCPD (e.g. 15116).

If L1 cache state is valid but the associated cache block does notcontain the correct set of data (for the case of a directly-mappedcache), the firmware initiates freeing of that cache block. If thatcache is clean, it can be freed instantly without any flush operation.But if the cache is dirty, firmware gets the associated flash physicallocation of data from LBA2FlashPBA table and initiates copying of datato L2 cache, which is faster than flushing to rotational drive. Then itupdates sub-status to sdram2flash. However, if L2 cache is full,flushing to rotational drive will be initiated instead, and sub-statuswill be set to sdram2hdd. Flushing of L2 cache to rotational drives canbe done in the background when firmware is not busy servicing hostcommands. After flushing of L1, firmware proceeds with the steps belowas if data is not in L1 cache.

If data is not in L1 cache, firmware requests for available L1 cache. Ifthere is no available L1 cache (L1 cache full), firmware selects an L1cache victim. If the selected victim is clean, or if it is dirty butconsistent with the copy in L2 cache, it is freed instantly. Otherwise,it is flushed to the rotational drive. The cache is invalidated (INVLD)and then assigned to the current command being serviced. Processing ofthe firmware continues as if the L1 cache state is INVLD (see discussionabove).

After the L1 cache state is updated due to a host-write (H2S), L2 cachestate is also updated. For the case when write data occupies only a partof the L1 cache line space and the write data did not cover all the copyin L2 cache, the copy in L2 cache becomes partially valid (VPCC, VPCD),since some parts of the L2 cache copy is invalidated (whether partiallyor fully dirty previously) (e.g. 15039). For the case when write dataoccupies only a part of the L1 cache line space and the write datacovered all the copy in L2 cache, the copy in L2 cache becomes INVLD(e.g. 15038).

Upon completion of host2sdram (H2S), firmware sends command status tohost and completes the command. But if the write command is of the writeFUA (first unit access) type, host2sdram (H2S) and sdram2hdd (S2HDD) isdone first before the command completion status is sent to the host.Once all L1 cache data is written to the HDD, L1 cache state becomesclean (VFC, VPCC) (e.g. 20060, 20042).

In the background, when interface is not busy, firmware initiatesflushing of L1 cache to L2 cache, L2 cache to rotational drives, and L1cache to rotational drives.

Flushing Algorithm

For a full-associative cache implementing a write-back policy, flushingis usually done when there is new data to be placed in cache, but thecache is full and the selected victim data to be evicted from the cacheis still dirty. Flushing will clean the dirty cache and allow it to bereplaced with new data.

Flushing increases access latency due to the required data transfer fromL1 volatile cache to the much slower rotational drive. The addition ofL2 nonvolatile cache allows faster transfers from L1 to L2 cache whenthe L1 cache is full, effectively postponing the flushing operation andallowing it to be more optimized.

To reduce latency and enhance the cache performance, flushing can bedone as a background operation. The LRU and LFU are the usual algorithmsused to identify the victim data candidates, but the addition of aFastest-to-Flush algorithm takes advantage of the random accessperformance of the L2 cache. It optimizes the flushing operation byselecting dirty victim data that can be written concurrently to the L2cache, and thus minimizing access time. The overhead brought about byflushing of cache can then be reduced by running concurrent flushoperations whenever possible. Depending on processor availability,flushing may be scheduled regularly or during idle times when there areno data transfers between the hybrid storage system and the host orexternal device.

Flushing of LI Cache

Flushing of L1 cache will occur only if copy of data in L1 cache is moreupdated than the copy in the rotational drive. This may occur, forexample, when a non FUA write command hits the L1 cache.

Flushing of LI cache is triggered by the following conditions:

1. Eviction caused by shared cache line—In set-associative ordirectly-mapped caching mode, if the cache or cache set assigned to aspecific address is valid but contains another data, that old data mustbe evicted to give way to the new data that needs to be cached. If theold data is clean, the cache is simply overwritten. If the old data isdirty, the cache is flushed first before writing the new data.

2. L1 cache is full—If an IO command being processed could not requestfor a cache due to a cache-full condition, a victim must be selected togive way to the current command. If the victim data is clean, the cacheis simply overwritten. If the victim data is dirty, the cache is flushedfirst before writing the new data.

In either (1) or (2), the victim data will be moved to either L2 cacheor rotational drive. Ideally in this case, firmware will move L1 cachedata to L2 cache first, since movement to L2 cache is faster. Refer to“Movement from L1 Cache to L2 Cache” for a detailed discussion. In casethe L2 Cache is full, firmware will have to move L1 cache data to therotational drive.

3. Interface is not busy—Flushing may also be done in the backgroundwhen drive is not busy servicing host commands. L1 cache is flusheddirectly to the rotational drive first, then if number of available L1caches has reached a pre-defined threshold, data is also copied to L2cache, in anticipation for more flushing due to L1 cache full condition.Refer to “Movement from L1 Cache to L2 Cache” for a detailed discussion.

When moving data from L1 cache to rotational drive, the firmware takesadvantage of concurrent drive operations by selecting cache lines thatcan be flushed in parallel among the least recently used candidates. Thefirmware also takes into consideration the resulting access type to thedestination drives. The firmware queues the request according to thevalues of the destination addresses such that the resulting access is asequential type.

Before firmware can initiate the flushing operation from L1 cache torotational drive, it must check first if there is an ongoing lockedcache operation. If there is an ongoing locked cache operation, thefirmware will have to wait until the operation is finished beforeinitiating the data transfer. When the current cache sub-state finallybecomes NOP, it will be changed back to S2HDD and the L1 cache flushingwill start. This change in cache sub-state indicates a new locked cacheoperation. After the L1 cache is flushed, cache sub-state goes back toNOP to indicate that the cache is ready for another operation.

FIGS. 20A and 20B lists the valid combinations of L1 and L2 cache statesand cache sub-state values that will allow data transfers from L1 cacheto rotational drive. It also shows the resulting cache states and cachesub-state values when an L1 cache to rotational drive data transfer isinitiated, and when it is completed. The L1 cache to rotational drivedata transfer may be initiated by an L1 cache flushing operation orwrite FUA operation. The succeeding discussion will focus on the L1cache flushing operation.

The L1 cache flushing operation may be initiated only for valid butdirty L1 cache states—either VPCD (rows 2006, 2007, 2009, 20011), VFCPD(rows 20018, 20019, 20021, 20023) or VFD (rows 20024, 20025, 20028).Upon completion of the flushing operation, the L1 cache is declaredclean. If the flushing operation was not completed, the cache pagebitmap is updated to reflect the dirty bytes that were cleaned. The L2cache state and cache page bitmap are also updated accordingly.

In the first case 2006 L1 cache state is VPCD and L2 cache state isINVLD. An example case is when the partially cached data in L1 wasupdated by a write operation and is now inconsistent with the data inthe rotational drive, but there is no copy yet in the L2 cache. If allthe dirty data are flushed 20042, L1 cache state is changed to VPCC toindicate that the partially cached data is now consistent with data inthe rotational drive. However if not all dirty bytes were flushed 20043,L1 cache state stays at VPCD, with the cache page bitmap updated toreflect the dirty bytes that were cleaned. L2 cache state stays INVLD.

In the second case 2007 both L1 and L2 cache state is VPCD. An examplecase is when the partially cached dirty data in L1 was initially evictedto L2, then a cache miss happens and data is partially cached in L1. L1was then updated by a write operation. This will also occur when thepartially cached dirty data in L1 was initially evicted to L2, then anL2 cache hit occurs and L2 data is copied back to L1. L1 and L2 can havethe same data, but they can also have different data if the L1 cache issubsequently updated by a write operation. If L1 and L2 have the samedata and all the dirty data are flushed 20044, both L1 and L2 cachestates are changed to VPCC to indicate that the partially cached data isnow consistent with data in the rotational drive. If L1 and L2 have thesame data but not all dirty bytes in L1 were flushed 20045, L1 and L2cache state stays at VPCD, with the cache page bitmap updated to reflectwhich pages were cleaned. If L1 and L2 have different data and all thedirty data are flushed 20046, L1 cache state is changed to VPCC toindicate that the partially cached data is now consistent with data inthe rotational drive. Since L2 cache contains different data, L2 cachestate stays at VPCD. If L1 and L2 have different data and not all dirtybytes were flushed 20047, L1 cache state stays at VPCD but the cachepage bitmap is updated to reflect which dirty bytes were cleaned. SinceL2 cache contains different data, L2 cache state stays at VPCD.

In the third case 2009, L1 cache state is VPCD and L2 cache state isVFCPD. An example case is when the fully cached dirty data in L1 wasinitially evicted to L2, then an L2 cache hit occurs and some L2 cachedata is copied back to LI. LI dirty data can initially be the same as L2dirty data, but they can have different dirty data if the L1 cache issubsequently updated by a write operation. If all the dirty data in L1and L2 are flushed 20049, L1 cache state is changed to VPCC and L2 cachestate is changed to VFC to indicate that cached data in both locationsare now consistent with data in the rotational drive. If not all dirtybytes in L1 were flushed 20050, L1 cache state stays at VPCD and L2cache state stays at VFCPD with the cache page bitmap updated to reflectwhich dirty bytes were cleaned. If all dirty bytes in L1 were flushedbut does not cover all the dirty bytes in L2 20051, only the L1 cachestate is changed to VPCC. L2 cache state stays at VFCPD with cache pagebitmap updated to reflect which dirty bytes were cleaned.

In the fourth case 20011, L1 cache state is VPCD and L2 cache state isVPCC. An example case is when clean, partially cached data in L1 wasinitially evicted to L2, then an L2 cache hit occurs, L2 cache data iscopied back to L1 and a subsequent write operation updated the data inL1 cache. This will also occur when a cache miss occurs, data ispartially cached in L1, and a subsequent write operation updated thedata in L1 cache. If all the dirty data are flushed 20053, L1 cachestate is changed to VPCC to indicate that the partially cached data isnow consistent with data in the rotational drive. However if not alldirty bytes were flushed 20054, L1 cache state stays at VPCD with thecache page bitmap updated to reflect which dirty bytes were cleaned. Inboth cases, L2 cache state stays at VPCC since it is not affected by theL1 cache flushing operation.

In the fifth case 20018, L1 cache state is VFCPD and L2 cache state isINVLD. An example case is when fully cached data in L1 is updated by awrite operation and is now inconsistent with data in the rotationaldrive, but there is no copy yet in the L2 cache. If all the dirty dataare flushed 20061, L1 cache state is changed to VFC to indicate that thefully cached data is now consistent with data in the rotational drive.If not all dirty bytes were flushed 20062, L1 cache state stays at VFCPDwith the cache page bitmap updated to reflect which dirty bytes werecleaned. L2 cache state stays INVLD.

In the sixth case 20019, L1 cache state is VFCPD and L2 cache state isVPCD. An example case is when partially cached dirty data in L1 wasinitially evicted to L2, then an L2 cache hit occurs, L2 cache data iscopied back to L1, and another read operation completes the cache line.A subsequent write operation may also add more dirty bytes in L1. If alldirty bytes in L1 and L2 were flushed 20063, L1 cache state is changedto VFC and L2 cache state is changed to VPCC to indicate that the fullycached data is now consistent with the data in the rotational drive. Ifall dirty bytes in L1 were flushed but does not cover all dirty bytes inL2 20064, L1 cache state is changed to VFC but L2 cache state stays atVPCD with the cache page bitmap updated to reflect which dirty byteswere cleaned. If not all dirty bytes were flushed 20065, L1 cache statestays at VFCPD and L2 cache state stays at VPCD with the cache pagebitmap updated to reflect which dirty bytes were cleaned.

In the seventh case 20021, both L1 and L2 cache state is VFCPD. Anexample case is when fully cached partially dirty data in L1 wasinitially evicted to L2, and then an L2 cache hit occurs, L2 cache datais copied back to L1. If all the dirty data are flushed 20066, both L1and L2 cache states are changed to VFC to indicate that the fully cacheddata is now consistent with the data in the rotational drive. If not alldirty bytes were flushed 20066, L1 and L2 cache state becomes VPCD, withthe cache page bitmap updated to reflect the dirty bytes that werecleaned.

In the eighth case 20023, L1 cache state is VFCPD and L2 cache state isVPCC. An example case is when clean partially cached was initiallyevicted to L2, then a read operation completed the L1 cache, and asubsequent write operation made the L1 cache partially dirty. If all thedirty data are flushed 20069, the L1 cache state becomes VFC to indicatethat the fully cached data is now consistent with the data in therotational drive. If not all the dirty data are flushed 20070, L1 cachestate becomes VPCD, with the cache page bitmap updated to reflect thedirty bytes that were cleaned. L2 cache state stays at VPCC.

In the ninth case 20024, L1 cache state is VFD and L2 cache state isINVLD. An example case is when the fully cached data in L1 becomes fullyinconsistent with the rotational drive due to a write operation, butthere is no copy yet in the L2 cache. If all the dirty data are flushed20071, the L1 cache state becomes VFC to indicate that the fully cacheddata is now consistent with the data in the rotational drive. If not allthe dirty data are flushed 20072, L1 cache state becomes VPCD, with thecache page bitmap updated to reflect the dirty bytes that were cleaned.Since L2 is not involved in the flushing operation, L2 cache state staysINVLD.

In the tenth case 20025, L1 cache state is VFD and L2 cache state isVPCD. An example case is when partially cached dirty data was initiallyevicted to L2, and a subsequent write operation made the L1 cachecompletely dirty. If all the dirty data are flushed 20073, the L1 cachestate becomes VFC to and the L2 cache state becomes VPCC to indicatethat cached data is now consistent with the data in the rotationaldrive. If not all the dirty cache data were flushed 20074 20075, L1cache state becomes VPCD, with the cache page bitmap updated to reflectthe dirty bytes that were cleaned. If the L1 flushing operation did notcover all L2 dirty data 20075, L2 cache state stays at VPCD, with thecache page bitmap updated to reflect the dirty bytes that were cleaned.Otherwise if the L1 flushing operation covered all L2 dirty data 20074,L2 cache state becomes VPCC.

In the eleventh case 20028, both L1 cache state is VFD. An example caseis when full dirty data was initially evicted to L2, and then an L2cache hit occurs, L2 cache data is copied back to L1. If all the dirtydata are flushed 20078, L1 and L2 cache state becomes VFC to indicatethat the fully cached data is now consistent with the data in therotational drive. If not all the dirty data are flushed 20079, L1 and L2cache state becomes VPCD, with the cache page bitmap updated to reflectthe dirty bytes that were cleaned.

Criteria for Choosing L1 Cache Victims

1. LRU—Least Recently Used data is most likely to be invalidated firstthan more recently used ones.

2. Fastest to Flush—Groups of data that can be flushed to rotationaldrives concurrently, and will form sequential type of accesses torotational drives will be prioritized. In moving data from LI to L2cache, groups of data that can be moved to L2 cache concurrently will beprioritized.

Flushing of L2 Cache

Flushing of L2 cache will occur only if copy of data in L2 cache is moreupdated than the copy in the rotational drive, and the copy in L1 cachehas been invalidated already. This may occur for example when dirty datahas been evicted from the L1 cache and the firmware transferred it tothe faster L2 cache instead of the rotational drive. In moving data fromL2 cache to rotational drive the firmware will take advantage of thedata distribution among the flash chips and among the rotational drivesto maximize parallelism.

Rather than deciding plainly based on the LRU algorithm, firmware willtake into consideration the source and target physical locations of thedata that needs to be moved from flash to rotational drive. As shown inFIG. 13, moving data from L2 cache to rotational drive can be optimizedby taking into account which data can be flushed to rotational driveconcurrently.

Similarly, the firmware also takes advantage of the speed of rotationaldrives in sequential access. Therefore, data movements are queued insuch a way that writing them to the rotational drives will be more inthe form of sequential accesses rather than random.

Flushing of the L2 cache consists of a two-step data transfer: transferfrom L2 to L1, and transfer from L1 to rotational drive.

FIGS. 17A and 17B lists the valid combinations of L1 and L2 cache statesand cache sub-state values that will allow data transfers from L2 cacheto L1 cache. It also shows the resulting cache states and cachesub-state values when an L2 cache to L1 cache data transfer isinitiated, and when it is completed. The L2 cache to L1 cache datatransfer may be initiated by an L2 cache flushing operation or an L2cache hit. The succeeding discussion focuses on the data transfer due toan L2 cache flushing operation.

A flushing operation is only done when L2 cache is dirty (L2 cache stateis VPCD, VFCD or VFD) and the dirty bytes in L2 cache does notcorrespond to the dirty bytes in the L1 cache. Upon completion of the L2cache to L1 cache transfer of dirty data, the L1 cache will contain alldirty bytes in L2 cache. The flushing operation is then completed by anL1 to rotational drive transfer. The succeeding discussion focuses onthe L2 cache to L1 cache data transfer due to an L2 cache flushingoperation. See the section “Flushing of L1 cache” for the detaileddiscussion of the L1 cache to rotational drive data transfer.

Before firmware can initiate the flushing operation by transferring datafrom L2 cache to L1 cache, it must check first if there is an ongoinglocked cache operation. If there is an ongoing locked cache operation,the firmware will have to wait until the operation is finished beforeinitiating the data transfer. When the current cache sub-state finallybecomes NOP, it will be changed back to F2S and the L2 cache flushingwill be initiated. This change in cache sub-state indicates a new lockedcache operation. After the L2 cache to L1 cache data transfer iscompleted, cache sub-state goes back to NOP to indicate that the cacheis ready for another operation.

In the first case 17001, L2 cache state is VPCD and L1 cache state isINVLD. An example case is when partially cached dirty data in L1 wasevicted to L2. After dirty data in L2 is transferred to L1, L1 cachestate is changed to VPCD 17037, with the cache page bitmap updated toreflect the new dirty bytes in L1. L2 cache state stays at VPCD.

In the second case 17003, L2 cache state is VFCPD and L1 cache state isINVLD. An example case is when fully cached partially dirty data in L1was evicted to L2. After dirty data in L2 is transferred to L1, L1 cachestate is changed to VPCD 17041, with the cache page bitmap updated toreflect the new dirty bytes in L1. L2 cache state stays at VFCPD.

In the third case 17004, L2 cache state is VFD and L1 cache state isINVLD. An example case is when full dirty data in L1 was evicted to L2.After all dirty data in L2 is transferred to L1; L1 cache state ischanged to VFD 17042. If not all dirty data in L2 is transferred to L1,L1 cache state is changed to VPCD 17041, with the cache page bitmapupdated to reflect the new dirty bytes in L1. L2 cache state stays atVPCD.

In the fourth case 17007, both L1 and L2 cache states are VPCD. Anexample case is when the partially cached dirty data in L1 was initiallyevicted to L2, then a cache miss happens and data is partially cached inL1. L1 was then updated by a write operation. This results in some dirtydata in L2 that is not present in L1. After dirty data in L2 istransferred to L1, L1 cache state is still VPCD 17047, with the cachepage bitmap updated to reflect the new dirty bytes in L1. L2 cache statestays at VPCD.

In the fifth case 17009, L2 cache state is VFCPD and L1 cache state isVPCD. An example case is when fully cached partially dirty data in L1was initially evicted to L2, and then a subsequent write operationcreated dirty bytes in L1 that is not on L2. After dirty datatransferred from L1 to L2 completes the L1 cache line, L1 cache statebecomes VFD 17049. If the L1 cache line is not completed, L1 cache statebecomes VFCPD 17050. L2 cache state becomes INVLD in both cases.

In the sixth case 17010, L2 cache state is VFD and L1 cache state isVPCD. An example case is when full dirty data in L1 was initiallyevicted to L2, and then a subsequent write operation created dirty bytesin L1 that is not on L2. After dirty data transferred from L1 to L2completes the L1 cache line, L1 cache state becomes VFD 17051. If the L1cache line is not completed, L1 cache state becomes VFCPD 17051. Thecache page bitmap updated to reflect the new dirty bytes in L1. L2 cachestate remains VFD in both cases.

In the seventh case 17031, L2 cache state is VPCD and L1 cache state isVPCC. An example case is when partially cached dirty data in L1 wasinitially evicted to L2, and a cache miss occurs during a readoperation. After dirty data transferred from L1 to L2 completes the L1cache line, L1 cache state becomes VFCPD 17074. If the L1 cache line isnot completed, L1 cache state becomes VPCD 17075. The cache page bitmapupdated to reflect the new dirty bytes in L1. L2 cache state remainsVCPD in both cases.

Criteria for Choosing L2 Cache Victims

1. LRU—Least Recently Used data is most likely to be invalidated firstthan more recently used ones.

2. Fastest to Fetch—Groups of data that can be fetched from flashdevices concurrently therefore requiring less time will be prioritized.

3. Fastest to Flush—Groups of data that can be flushed to rotationaldrives concurrently, and will form sequential type of accesses torotational drives will be prioritized.

The drawing in FIG. 14 shows an example scenario where four flashdevices all have dirty blocks that need to be flushed to the tworotational drives. The following are the steps to flush the dirty L2cache blocks to the rotational drives using the “Fastest to fetch” and“Fastest to flush” criteria.

1. Allocate resources for the maximum number of flash DMA engines thatcould simultaneously transfer data from flash to SDRAM, given the listof dirty blocks.

2. Among the groups of data that can be fetched simultaneously fromflash, choose the blocks that are sequentially closer in the rotationaldrives. Start transferring data from flash to SDRAM. Activate as manyconcurrent operations as possible.

3. When a transfer has completed, start moving data from SDRAM torotational drives.

For example, if FLASH2SDRAM transfer of FDEV01:BLK03, FDEV02:BLK01, andFDEV03:BLK02 completed already, start SDRAM2HDD movement of FDEV01:BLK03and FDEV02:BLK01 first since they are going to separate rotationaldrives. We selected FDEV01:BLK03 over FDEV03:BLK02 becauseFDEV01:BLK03's location in HDD1 is sequentially lower thanFDEV03:BLK02's location, therefore achieving greater potential forsequential type of access. Keep doing these every time a transfer fromflash to SDRAM completes.

Movement from L1 Cache to L2 Cache

Once the actual amount of storage being used by the application hasgrown considerably, the chances of L1 cache hits will be lesser and thechances of L1 cache being full will be greater. This is the case wherethe presence of data in L2 cache can significantly improve theperformance of the system. When the firmware detects that the percentageof used L1 cache has reached a pre-defined threshold, it starts copyingdata to L2 cache in the background during idle times. The more datathere is in the L2 cache, the lesser the chances that the firmware willhave to access data in the rotational drives.

If directly-mapped L1 caching scheme is used, it is possible that only asmall percentage of available L1 is being utilized, and some of thefrequently accessed data blocks are mapped to the same L1 cache entry,therefore requiring frequent eviction of those associated cache blocks.In such situations, it will also be helpful if those frequently accessedand frequently evicted data are stored in L2 cache for faster access. Amethod to identify these blocks of data is to keep track of the dataaccess counts. If the access count of a block belonging to the LRU listreaches a pre-defined threshold, the firmware will copy it to L2 cache.This method is a combination of the LRU and LFU (Least-Frequently Used)algorithm, which implies that the most recently used and most frequentlyused data, should be prioritized by the caching scheme.

In moving data from L1 cache to L2 cache, the firmware takes advantageof concurrent flash device operations by selecting cache lines that canbe flushed in parallel among the candidates. As stated earlier the L1cache to L2 cache data movement is initiated by two possible conditions.First, this operation is initiated during host read or writes to datathat is not cached in the L1 cache, specifically during an L1 cache fullcondition. In this case, an L1 entry should be freed and the firmwaredetermines that the associated data should be transferred to the L2cache. The motivation for opting to transfer first to the L2 cacheinstead of flushing back to the HDD in this situation is that L1 to L2transfers can be performed faster. Completing the transfer faster willallow quicker freeing of L1 space and improve response time to the hostread or write request. The second event that triggers the L1 cache to L2cache data movement is when it is initiated by the background processthat maintains the threshold for the minimum number of available L1cache lines and the firmware determines that the associated data shouldbe transferred to the L2 cache. The motivation for opting to transferfirst to the L2 cache instead of flushing back to the HDD in thissituation is if the associated data is among the most frequentlyaccessed data, but less recently used than other such data. This avoidsthe cache full condition but since the data is still frequently used, itis preferable to keep a copy in the L2 cache so it can be retrievedfaster. For both conditions, the operation commences with the selectionof L1 cache lines that will next be transferred to the L2 cache. Theselection categories shall be applied only to those cache lineinformation table entries that have NOP sub-state or those that are notundergoing any other data movement. They shall also only be applied ifthe cache state table entry specifies that the data in the L1 cache is amore updated copy than that in the L2 cache or that the data is cachedonly in L1. Hence, there are eleven possible initial cache states fortable entries in the cache line information table that will proceed withL1 to L2 transfer.

In the first case 18012, the selected entry's L1 cache state is VFC andits L2 cache state is INVLD. This occurs if the data in the L1 cache hadalready been flushed to the HDD but there is no copy in L2 cache. Toindicate that L1 is being copied to L2, the cache sub-state will bechanged from NOP to S2F. Upon successful completion of the data transfer18050, the L2 copy is now consistent with the L1 copy and the L2 cachestate will be changed to VFC also. The cache sub-state will return toNOP to indicate that the data is no longer in transit. The firmware maynow opt to free the L1 space for use by other entries and set the L1cache state to invalid. In the second case 18006, the selected entry'sL1 cache state is VPCD and its L2 cache state is INVLD. In this case thedata copy in L1 has had updates in some parts and is now inconsistentwith the data counterpart in the HDD, but there is no copy yet in theL2. To indicate that L1 is being copied to L2, the cache sub-state willbe changed from NOP to S2F. Upon successful completion of the datatransfer 18042, the L2 copy is now consistent with the L1 copy and theL2 cache state will be changed to VPCD also. The cache sub-date willreturn to NOP to indicate that the data is no longer in transit. Thefirmware may now opt to free the L1 space for use by other entries andset the L1 cache state to invalid.

In the third case 18007, the selected entry's L1 cache state is VPCD andits L2 cache state is also VPCD. This occurs when a partially dirty dataexists in both L2 and L1 but they are not exactly the same parts so thatdata in L1 is not consistent with data in L2. To indicate that theupdated data parts in L1 is being copied to the L2 cache the cachesub-state will be changed from NOP to S2F. Upon successful completion ofthe data transfer 18043, the L2 copy now also contains the updates fromthe L1 copy. If the updates fill up all unfilled bytes in the L2 cachethen the L2 cache state is changed to VFD. If not the L2 cache state isstill to VPCD but the L2 cache contains the complete copy of the dirtybytes. The cache sub-state will return to NOP to indicate that the datais no longer in transit. The firmware may now opt to free the L1 spacefor use by other entries and set the LI cache state to invalid.

In the fourth case 18011, the selected entry's L1 cache state is VPCDand its L2 cache state is VPCC. This occurs when partial data is cachedin L1 and some or all those data is dirty. Partial data is also cachedin L2 but the data in L2 is consistent with that in the HDD. Hence allthe updated parts in L1 are not yet in L2. To indicate that L1 is beingcopied to L2, the cache sub-state will be changed from NOP to S2F. Uponsuccessful completion of the data transfer 18048, the L2 copy now alsocontains the updates from the L1 copy. If the updates fill up allunfilled bytes in the L2 cache then the L2 cache state is changed toVFCPD since the L2 had formerly clean bytes but was filled up with somedirty bytes from L1. If the updates do not fill up the L2 cache, the L2cache state is still to VPCD but the L2 cache contains the complete copyof the dirty bytes. The cache sub-state will return to NOP to indicatethat the data is no longer in transit. The firmware may now opt to freethe L1 space for use by other entries and set the L1 cache state toinvalid.

In the fifth case 18018, the selected entry's L1 cache state is VFCPDand its L2 cache state is INVLD. In this case the data is fully cachedin L1 but has had updates in some parts and is now inconsistent with thedata counterpart in the HDD. There is no valid copy in the L2 cache. Toindicate that L1 is being copied to L2, the cache sub-state will bechanged from NOP to S2F. Upon successful completion of the data transfer18056, the L2 copy is now consistent with the L1 copy and the L2 cachestate will be changed to VFCPD also. The cache sub-date will return toNOP to indicate that the data is no longer in transit. The firmware maynow opt to free the L1 space for use by other entries and set the L1cache state to invalid.

In the sixth 18019, the selected entry's L1 cache state is VFCPD and theL2 cache state is VPCD. In this case the data is fully cached in L1 buthas had updates in some parts. The data is not fully cached in L2 butthe some parts in the L2 data are updated. Some or all of the updateddata parts in L2 are not in L1. To indicate that the inconsistent dataparts are being copied to L2, the cache sub-state is changed from NOP toS2F. Upon successful completion of the data transfer 18057, the L2 copyis now consistent with the L1 copy and the L2 cache state will bechanged to VFCPD also. The cache sub-date will return to NOP to indicatethat the data is no longer in transit. The firmware may now opt to freethe L1 space for use by other entries and set the L1 cache state toinvalid.

In the seventh case 18023, the selected entry's L1 cache state is VFCPDand the L2 cache state is VPCC. The data is fully cached in L1 but hashad updates in some parts. The data is not fully cached in L2 but alldata in the L2 cache are clean. To indicate that the inconsistent dataparts are being copied to L2, the cache sub-state is changed from NOP toS2F. Upon successful completion of the data transfer 18061, the L2 copyis now consistent with the L1 copy and the L2 cache state will bechanged to VFCPD also. The cache sub-date will return to NOP to indicatethat the data is no longer in transit. The firmware may now opt to freethe L1 space for use by other entries and set the L1 cache state toinvalid.

In the eighth case 18030, the L1 cache state is VPCC and the L2 cachestate is INVLD. In this case the data is fully cached in L1 but has hadupdates in some parts and is now inconsistent with the data counterpartin the HDD. There is no valid copy in the L2 cache. To indicate that L1is being copied to L2, the cache sub-state will be changed from NOP toS2F. Upon successful completion of the data transfer 18056, the L2 copyis now consistent with the L1 copy and the L2 cache state will bechanged to VFCPD also. The cache sub-date will return to NOP to indicatethat the data is no longer in transit. The firmware may now opt to freethe L1 space for use by other entries and set the L1 cache state toinvalid.

In the ninth 18031 the selected entry's L1 cache state is VPCC and theL2 cache state is VPCD. In this case the data is partially cached in L1and the copy is clean. The data is also not fully cached in L2 but someor all parts in the L2 data are updated. To indicate that the data in L1but not in L2 are being copied to L2, the cache sub-state is changedfrom NOP to S2F. Upon successful completion of the data transfer, if theupdates from L1 did not fill all unfilled bytes in L2, then L2 statewill remain at VPCD. If the updates fill up all unfilled bytes in L2then L2 state will change to VFCPD. The cache sub-date will return toNOP to indicate that the data is no longer in transit. The firmware maynow opt to free the L1 space for use by other entries and set the L1cache state to invalid.

In the tenth case 18035, the selected entry's L1 cache state is VPCC andthe L2 cache state is also VPCC. This means the data is not fully cachedin both L1 and L2 but they contain different data and the data in bothcaches are clean. To indicate that the data parts in L1 are being copiedto L2, the cache sub-state is changed from NOP to S2F. Upon successfulcompletion of data transfer 18073, if updates from L1 did not fill allunfilled bytes in L2, then L2 state will remain at VPCC. If the updatesfill up all unfilled bytes in L2 then L2 state will change to VFC. Thecache sub-date will return to NOP to indicate that the data is no longerin transit. The firmware may now opt to free the L1 space for use byother entries and set the L1 cache state to invalid.

In the eleventh case 18024, the L1 cache state is VFD and the L2 cachestate is INVLD. This occurs when the data counterpart in the HDD wasentirely updated in L1 cache but there is no copy yet in the L2 cache orthat there is a copy in the L2 but it was invalidated because the datawas entirely replaced. To indicate that the data in L1 is being copiedto the L2 cache, the cache sub-state will be changed from NOP to S2F.Upon successful completion of the data transfer 18062, the L2 copy isnow consistent with the L1 copy and the L2 cache state is changed to VFDalso. The cache sub-state will return to NOP to indicate that the datais no longer in transit. The firmware may now opt to free the L1 spacefor use by other entries and set the L1 cache state to invalid.

In all cases during the state when the data is in transit from L1 to L2,i.e. the cache sub-state is S2F; any request from the host to update thedata can still be accepted by aborting the pending L1 to L2 datatransfers. To indicate that the data from the host is being accepted,the cache sub-state will be changed from S2F to H2S. If the host updatedthe data in L1 entirely, then after accepting the data the L1 cachestate will be VFD and L2 cache state will be invalidated. If the hostdid not update the data in L1 entirely but any of those already datatransferred to the L2 cache were among those updated, then L2 cachestate will also be invalidated and L1 cache state will be VFCPD or VPCD.The sub-state can be changed from H2S to NOP after accepting the updateto indicate that there is no more data transfer going on. If thefirmware still opts to make a copy of the data in L2 cache, the it willhave to re-initiate the L1 to L2 data movement operation with the L2cache state initially INVLD and with L1 cache state VFD (case 11 18024)or VFCPD (case 5 18018) or VPCD (case 2 18006). If the host updatedthose data parts in L1 that have not been transferred to L2 cache andthose data parts that have been copied to L2 were untouched, then thepending L1 to L2 transfers can actually proceed using the more updateddata parts in L1. The sub-state can be changed from H2S back to S2F andproceed with the previously aborted L1 to L2 transfer. As much aspossible, the firmware avoids the situation where an L1 to L2 transfergets aborted by selecting least recently used entries for this transfer.This lessens the probability that the host will update that particulardata.

Read Buffering

In some applications such as video streaming, storage accesses aretypically large sequential reads. In such cases, it is more efficient toallocate a certain amount of high-speed buffer that can be used to storedata from the flash media or the rotating drives for immediateforwarding to the host through the host IO interface. Every time thehost issues a read command, the firmware checks if the data is in L1. Ifnot, it checks if data is in L2. If data is in L2, the firmware fetchesit and stores it to the high-speed buffer and immediately transfers itto the host. If data is not in L2, the firmware fetches it from therotating drives, stores it to the buffer, and forwards it to the host.This scheme further improves performance by creating a dependency linkbetween the DMA controllers, such that the completion of a specific datatransfer (e.g. flash to buffer) may trigger the start of another datatransfer (e.g. buffer to host), without intervention from the localprocessors.

In FIG. 1, the Read Buffering scheme can be implemented using theinternal SRAM 114 which has a dedicated data link to the flash interface108 and the other IO interfaces 106 and 107. Here, the SRAM 114 can beused as the high-speed buffer for moving data from flash 109 to host 112or from rotating drives 105 to host 112.

Power-Loss Data Recovery

For hybrid devices equipped with back-up power such as those illustratedin FIG. 2 to FIG. 9 of U.S. Pat. No. 7,613,876, entitled “HybridMulti-Tiered Caching Storage System”, the non-volatile L2 cache willserve as temporary storage for dirty data that haven't been flushed tothe hard drives at the instant the power loss occurred. The limitationsin the speed of a hard drive and its high power requirement makes itimpractical to provide a back-up power supply capable of keeping thedevice alive while flushing all dirty data from L1 and L2 to the harddrives. The flash-based L2 requires less power and allows faster savingof data due to its capability to execute simultaneous operations onmultiple flash devices.

FIG. 21A shows an example state of L1 and L2 during normal IOoperations, before an external power loss occurs. When the firmwaredetects the loss of external power, any dirty copies of data in L1 willbe moved to L2 and the corresponding cache line information indicatingthe validity of the copy in L2 will be saved to non-volatile memoryaccordingly as shown in FIG. 21B and FIG. 21C. Similarly, dirty data inL2 that has no copy in L1 will be kept in L2 and the corresponding cacheline information will be saved also. The firmware assumes that theback-up power supply has enough charge to allow completion of datatransfer operations from L1 to L2. When external power resumes, thedevice can proceed to its normal boot-up sequence since the state of alldata had been saved in the cache line information. When the host triesto read data whose latest copy is still in L2 after the previous powerinterruption, the firmware will read the corresponding cache lineinformation from non-volatile memory and find out that L2 cache isdirty, as shown in FIG. 21D. For example, if the L2 cache is full dirtyand host is trying to read the entire cache line, the firmware will copythe data from L2 to L1 (17042) and give that copy in L1 to the host(16064). If the host sends a Read FUA command instead of a normal Readcommand, the firmware will fetch the data from L2 to L1 (17042), flushit from L1 to hard drive (20078), and finally read the copy in the harddrive that has just been updated and send it to host (19056 followed by16050).

FIG. 22 illustrates a hybrid storage device 2101 connected directly tothe host 112 and to the rotational drives 105 through the storagecontroller's available IO interface DMA controllers 107 and 106respectively, in accordance with another embodiment of the invention.Components in FIG. 22 that are similarly shown in FIG. 1 and/or otherdrawings will have the same or similar functionalities as describedabove and will not be repeated for purposes of brevity.

The rotational drives 105 are connected to one or more IO interface DMAcontrollers 106 capable of transferring data between the drives 105 andthe high-speed L0 cache (SRAM) 2214 (write buffer 2214). Another set ofIO interface DMA controllers 107 is connected to the host 112 fortransferring data between the host 112 and the L0 cache 2214. The Flashinterface controller 108 on the other hand, is capable of transferringdata between the L0 cache 2214 and the L2 cache (flash devices) 103.

Multiple DMA controllers can be activated at the same time both in thestorage IO interface and the Flash interface sides. Thus, it is possibleto have simultaneous operations on multiple flash devices, andsimultaneous operations on multiple rotational drives.

Data is normally cached in the L0 cache 2214, being the fastest amongthe available cache levels. Therefore, write buffering (write cacheenable) is performed by buffering the write data into the L0 cache 2214.In an embodiment of the invention, the device 2101 may also include theL1 cache 104 and/or L2 cache 103 as available cache levels. The IOinterface DMA engine 107 connected between the host 112 and the DMAbuses 110 and 111 is responsible for high-speed transfer of data betweenthe host 112 and the L0 cache 2214. There can be multiple IO interfaceports connected to a single host and there can be multiple IO interfaceports connected to different hosts. In the presence of multiple IOinterface to host connections, dedicated engines are available in eachIO interface ports allowing simultaneous data transfer operationsbetween hosts and the hybrid device. The engines operate directly on theL0 cache memory 2214 eliminating the need for temporary buffers and theextra data transfer operations associated with them.

For each level of cache, the firmware keeps track of the number of cachelines available for usage. It defines a maximum threshold of unusedcache lines, which when reached causes it to either flush some of theused cache lines to the medium or copy them to a different cache levelwhich has more unused cache lines available. When the system reachesthat pre-defined threshold of unused L0 cache, the system starts movingdata from L0 2214 to 12 cache 103. The 12 cache is slower than the L0cache but usually has greater capacity. The 12 cache 103 includes arraysof flash devices 109. Flash interface 108 includes multiple DMA engines115 and connected to multiple buses 116 connected to the flash devices109. Multiple operations on different or on the same flash devices canbe triggered in the flash interface. Each engine operation involves asource and a destination memory. For L0 to L2 data movements, the flashinterface engines copy data directly from the memory location of thesource L0 cache to the physical flash blocks of the destination flash.For L2 to L0 data movements, the flash interface engines 115 copy datadirectly from the physical flash blocks of the source flash to thememory location of the destination L0 cache. For 12 to 11 datamovements, the flash interface engines 115 copy data directly from thephysical flash blocks of the source flash to the memory location of thedestination cache.

Transfers of data from L0 2214 to hard disk drives 105 and vice versaare handled by the DMA controllers of the IO interfaces 106 connected tothe hard disk drives 105. These DMA controllers operate directly on theL0 cache memories, again eliminating the need for temporary buffers.Data transfers between 12 103 and the hard disk drives 105 go through L02214. This requires synchronization between and L0 be built into thecaching scheme.

Although FIG. 22 shows a system where the rotational drives 105 areoutside the hybrid storage device 2101 connected via IO interfaces 106,slightly different architectures can also be used. For example, therotational drives 105 can be part of the hybrid storage device 2101itself, connected to the storage controller 2102 via a disk controller.Another option is to connect the rotational drives 105 to an IOcontroller connected to the hybrid storage controller 2102 through oneof the IO interfaces 106 of the controller 2101. Similarly, theconnection to the host is not in any way limited to what is shown inFIG. 22. The hybrid storage device 2101 can also attach to the host 112through an external IO controller. The hybrid storage device 2101 canalso be attached directly to the host's network domain. More details ofthese various configurations can be found in, for example, the figuresof commonly-owned and commonly-assigned U.S. Pat. Nos. 8,032,700 and7,613,876, both entitled “Hybrid multi-tiered caching storage system”.

Write Command

When firmware receives a Write command from the host 112, the firmwarederives the cache control block index (SRAM Index) based on the hostLBA. Then the firmware checks the designated cache control block ifrequested LBA is in L cache.

If L0 cache state is invalid (INVLD) and there is no ongoing lockedoperation (NOP), the firmware start transfer from host to L0 cache andupdates cache sub-status to H2S. After completion of host2sram transfer,the firmware updates cache sub-status to NOP. If the write data uses allof the cache line space, L0 cache state becomes VFD (e.g. 15036 in FIG.15B but applicable to the L0 cache instead of the 11 cache), otherwiseL0 the cache state becomes VPCD (e.g. 15037). For write buffering intothe L0 cache, the 11 states in FIG. 15B will be L0 states instead. Forthe case when write data uses all of the L0 cache line space, the copyin L2 cache becomes INVLD (e.g. 15038).

If the L0 cache state is valid (VPCD, VFC, VFCPD, VFD, VPCC), there isno ongoing locked operation (NOP), and the associated cache contains thecorrect set of data, the firmware start transfer from host to L0 cacheand updates cache sub-status to host2sram. After completion of host2sramtransfer, the firmware updates cache sub-status to NOP.

If L0 previous cache state is VPCD, there are 4 options: (1) if writedata uses all of the cache line space, L0 cache state becomes VFD (e.g.15055). (2) If write data is less than the cache line space, there's nomore free cache line space, and there is no more clean cache area, theL0 cache state becomes VFD (e.g. 15057). (3) If write data is less thanthe cache line space and there is still some free cache line space, theL0 cache state becomes VPCD (e.g. 15058). (4) If write data is less thanthe cache line space, there is no more free cache line space, and thereis still some clean cache area, the L0 cache state becomes VFCPD (e.g.15060).

If the L0 previous cache state is VFC, there are 2 options: (1) if writedata uses all of the cache line space, the L0 cache state becomes VFD(e.g. 15080 in FIG. 15C), (2) if write data is less than the cache linespace, the L cache state becomes VFCPD (e.g. 15081), since not all thecache data were over written. For write buffering into the L0 cache, theL1 states in FIG. 15C will be L0 states instead.

If the L0 previous cache state is VFCPD, there are 3 options: (1) Ifwrite data uses all of the cache line space, the L0 cache state becomesVFD (e.g. 15087). (2) If write data is less than the cache line spaceand there is no more clean cache line space, the L0 cache state becomesVFD (e.g. 15088). (3) If write data is less than the cache line space,and there is still some clean cache area, the L0 cache state becomesVFCPD (e.g. 15089).

If the L0 previous cache state is VFD, there is only 1 option: (1) theL0 cache state remains at VFD no matter what the write data size is(e.g. 15105 in FIG. 15D). For write buffering into the L0 cache, the L1states in FIG. 15D will be L0 states instead.

If the L) previous cache state is VPCC, there are 3 options: (1) ifwrite data uses all of the cache line space, the L0 cache state becomesVFD (e.g. 15115), (2) If write data is less than the cache line spaceand there is still some free cache line space, the L0 cache statebecomes VPCD (e.g. 15117). (3) If write data is less than the cache linespace, there is no more free cache line space, and there is still someclean cache area, the L0 cache state becomes VFCPD (e.g. 15116).

If the L0 cache state is valid but the associated cache block does notcontain the correct set of data (for the case of a directly-mappedcache), the firmware initiates freeing of that cache block.

If that cache is clean, it can be freed instantly without any flushoperation. But if the cache is dirty, the firmware gets the associatedflash physical location of data from LBA2FlashPBA table and initiatescopying of data to the L2 cache, which is faster than flushing torotational drive. Then the firmware updates sub-status to sram2flash.However, if the L2 cache is full, flushing to rotational drive will beinitiated instead, and sub-status will be set to sram2hdd. Flushing ofthe L2 cache to rotational drives can be done in the background whenfirmware is not busy servicing host commands. After flushing of the L0cache, the firmware proceeds with the steps below as if data is not inthe L0 cache.

If data is not in the L0 cache, the firmware requests for the availableL0 cache. If there is no available L0 cache (the L0 cache is full), thefirmware selects an L0 cache victim. If the selected victim is clean, orif it is dirty but consistent with the copy in L2 cache, it is freedinstantly. Otherwise, it is flushed to the rotational drive. The cacheis invalidated (INVLD) and then assigned to the current command beingserviced. Processing of the firmware continues as if the L0 cache stateis INVLD (see discussion above).

After the L0 cache state is updated due to a host-write (H2S), the L2cache state is also updated. For the case when write data occupies onlya part of the L0 cache line space and the write data did not cover allthe copy in L2 cache, the copy in L2 cache becomes partially valid(VPCC, VPCD), since some parts of the L2 cache copy is invalidated(whether partially or fully dirty previously) (e.g. 15039 in FIG. 15B).For the case when the write data occupies only a part of the L0 cacheline space and the write data covered all the copy in L2 cache, the copyin L2 cache becomes INVLD (e.g. 15038).

Upon completion of host2sram (H2S), the firmware sends command status tohost and completes the command. But if the write command is of the writeFUA (first unit access) type, host2sram (H2S) and sram2hdd (S2HDD) isdone first before the command completion status is sent to the host.Once all L0 cache data is written to the HDD, the L0 cache state becomesclean (VFC, VPCC) (e.g. 20060, 20042 in FIG. 20B).

In the background, when interface is not busy, the firmware initiatesflushing of L0 cache to L2 cache, L2 cache to rotational drives, and L0cache to rotational drives.

Flushing Algorithm

For a full-associative cache implementing a write-back policy, flushingis usually done when there is new data to be placed in cache, but thecache is full and the selected victim data to be evicted from the cacheis still dirty. Flushing will clean the dirty cache and allow it to bereplaced with new data.

Flushing increases access latency due to the required data transfer fromL0 volatile cache to the much slower rotational drive. The addition ofL2 nonvolatile cache allows faster transfers from L0 to L2 cache whenthe L0 cache is full, effectively postponing the flushing operation andallowing it to be more optimized.

To reduce latency and enhance the cache performance, flushing can bedone as a background operation. The LRU and LFU are the usual algorithmsused to identify the victim data candidates, but the addition of aFastest-to-Flush algorithm takes advantage of the random accessperformance of the L2 cache. The Fastest-to-Flush algorithm optimizesthe flushing operation by selecting dirty victim data that can bewritten concurrently to the L2 cache, and thus minimizing access time.The overhead brought about by flushing of cache can then be reduced byrunning concurrent flush operations whenever possible. Depending onprocessor availability, flushing may be scheduled regularly or duringidle times when there are no data transfers between the hybrid storagesystem and the host or external device.

Flushing of L0 Cache

In an embodiment of the invention, flushing of L0 cache will occur onlyif copy of data in L0 cache is more updated than the copy in therotational drive. This may occur, for example, when a non FUA writecommand hits the L cache.

Flushing of L0 cache is typically triggered by the following conditions:

1. Eviction caused by shared cache line—In set-associative ordirectly-mapped caching mode, if the cache or cache set assigned to aspecific address is valid but contains another data, that old data mustbe evicted to give way to the new data that needs to be cached. If theold data is clean, the cache is simply overwritten. If the old data isdirty, the cache is flushed first before writing the new data.

2. L0 cache is full—If an IO command being processed could not requestfor a cache due to a cache-full condition, a victim must be selected togive way to the current command. If the victim data is clean, the cacheis simply overwritten. If the victim data is dirty, the cache is flushedfirst before writing the new data.

In either (1) or (2) discussed above, the victim data will be moved toeither L2 cache or rotational drive. Ideally in this case, the firmwarewill move the L0 cache data to the L2 cache first, since movement to L2cache is faster. In case the L2 Cache is full, firmware will have tomove the L0 cache data to the rotational drive.

3. Interface is not busy—Flushing may also be done in the backgroundwhen drive is not busy servicing host commands. The L0 cache is flusheddirectly to the rotational drive first, then if number of available L0caches has reached a pre-defined threshold, data is also copied to L2cache, in anticipation for more flushing due to the L0 cache fullcondition.

When moving data from the L0 cache to rotational drive, the firmwaretakes advantage of concurrent drive operations by selecting cache linesthat can be flushed in parallel among the least recently usedcandidates. The firmware also takes into consideration the resultingaccess type to the destination drives. The firmware queues the requestaccording to the values of the destination addresses such that theresulting access is a sequential type.

Before firmware can initiate the flushing operation from the L0 cache torotational drive, the firmware must check first if there is an ongoinglocked cache operation. If there is an ongoing locked cache operation,the firmware will have to wait until the operation is finished beforeinitiating the data transfer. When the current cache sub-state finallybecomes NOP, the cache sub-state will be changed back to S2HDD and theL0 cache flushing will start. This change in cache sub-state indicates anew locked cache operation. After the L0 cache is flushed, cachesub-state goes back to NOP to indicate that the cache is ready foranother operation.

In another embodiment of the invention, the rotational drives HDD areomitted. Therefore, in this embodiment, the flash devices 109 are themain storage or main non-volatile storage and/or are the main storageand also can be an L2 cache. The L0 cache is coupled to the host and hasthe same functionality as the L1 cache but the L0 cache is faster.

The above discussion on the algorithm for performing the data flow fromthe host to L1 to L2 and L3 (and vice versa) can also be applied toperform the data flow from the host to L0 to L1 to L2 and L3 (and viceversa), or from the host to L0 and L2 (and vice versa) or form the hostto L0 to L2 and to L3 (and vice versa).

Foregoing described embodiments of the invention are provided asillustrations and descriptions. They are not intended to limit theinvention to precise form described. In particular, it is contemplatedthat functional implementation of invention described herein may beimplemented equivalently in hardware, software, firmware, and/or otheravailable functional components or building blocks, and that networksmay be wired, wireless, or a combination of wired and wireless.

It is also within the scope of the present invention to implement aprogram or code that can be stored in a machine-readable orcomputer-readable medium to permit a computer to perform any of theinventive techniques described above, or a program or code that can bestored in an article of manufacture that includes a computer readablemedium on which computer-readable instructions for carrying outembodiments of the inventive techniques are stored. Other variations andmodifications of the above-described embodiments and methods arepossible in light of the teaching discussed herein.

The above description of illustrated embodiments of the invention,including what is described in the Abstract, is not intended to beexhaustive or to limit the invention to the precise forms disclosed.While specific embodiments of, and examples for, the invention aredescribed herein for illustrative purposes, various equivalentmodifications are possible within the scope of the invention, as thoseskilled in the relevant art will recognize.

These modifications can be made to the invention in light of the abovedetailed description. The terms used in the following claims should notbe construed to limit the invention to the specific embodimentsdisclosed in the specification and the claims. Rather, the scope of theinvention is to be determined entirely by the following claims, whichare to be construed in accordance with established doctrines of claiminterpretation.

We claim:
 1. Apparatus for storing data comprising: a write bufferingscheme comprising a plurality of cache devices, wherein write data ismoved from a first cache device to a second cache device when apre-defined threshold of unused cache lines is reached in the firstcache device; wherein the second cache is slower than the first cache;wherein the second cache has a greater storage capacity than the firstcache.